Saturday, March 31, 2007

Complexity classes

I just found this excellent explanation of complexity classes P, NP and more, explaining how it works, what it means, and the significance of it all. It's by Scott Aaronson, a theoretical computer scientist at the University of Waterloo. If you thought my last blog post on this subject was pointless (if I'm talking to you, you already know it), read this: it's at least some really cool-looking theoretical computer science and math. As far as I can tell, all it really assumes is knowledge of Big-O notation, which most of my readers probably already have (if not, see Wikipedia and the old blog post of mine linked above). Here's an indecipherable chunk from the lecture that will make perfect sense when you're done reading it:

Likewise, it's not hard to prove that if NP=coNP, then the entire polynomial hierarchy collapses down to NP (or in other words, to coNP). If Σ2P=Π2P, then the entire polynomial hierarchy collapses down to Σ2P, and so on. If you think about it, this gives us a whole infinite sequence of generalizations of the P≠NP conjecture, each one "harder" to prove than the last. Why do we care about these generalizations? Because often, we're trying to study conjecture BLAH, and we can't prove that BLAH is true, and we can't even prove that if BLAH were false then P would equal NP. But -- and here's the punchline -- we can prove that if BLAH were false, then the polynomial hierarchy would collapse to the second or the third level. And this gives us some sort of evidence that BLAH is true.

Welcome to complexity theory!

By the way, you may have noticed that this blog post is a lot shorter and has a lot more links and quotes than most of my past blog posts. Well, in one of my favorite blogs, Language Log, they were discussing French blogs and said,

At least, the half-dozen posts [on some stupid French philosopher's blog] that I've read are all plain-text essays, between 1,000 and 1,500 words each, with no hyperlinks, no quotations, no tables, no pictures. Even when he starts a post with a reference to someone else's writing, he doesn't link to it, and he doesn't even quote it, he just assumes that his readers will have read it and will know what he's talking about. Or maybe he assumes that they won't have read it, but will form an adequate opinion of its content from his discussion, I don't know.

Look, it's a supercomputer taken with
a wide-angle lens!

I can't emphasize this enough, so I'll put it in <em> tags: I am not a French philosopher!. For your further enjoyment, I have included a superfluous picture to the right. Have fun reading a real blog!

Tuesday, March 27, 2007

High school computer science education

I've read so a bunch of rants about why university-level computer science is terrible, but nothing about the high school level. As a high schooler (you read that right), I think this should be given more attention: high school computer science is terrible in at least 90% of schools. I know there's been research as to how to teach children, whether Scheme or Java should be used. But really, that's irrelevant. The relevant thing is, we need better teachers.

Brighton High School is known for being a great school academically. There are four different programs for getting college credit while in high school, including multivariable calculus and linear algebra courses. We ranked in the top 25 in that stupid arbitrary Newsweek survey of high schools, if that means anything. But our computer science is terrible.

It's not for a lack of interest. In 7th and 8th grade, when kids in advanced mathematics started to get graphing calculators (always TI-83s, of course), programming little games became very popular in the primitive BASIC included with the calculator. I wrote a centipede clone and a "racing game", which was more about not hitting the wall than going fast. When high school started, a lot of kids took a computer science elective as freshmen. I decided not to, thinking I wouldn't learn anything. From what I heard from other people, that assessment was accurate. In one semester, meeting every day for nearly an hour, the class learned how to define methods in Java, do simple arithmetic, and at the end, write a program called "bank account" which used two parallel arrays and a simple CLI to let users create a bank account with a name and an amount of money. Using text commands, you could make new bank accounts or deposit or withdraw money. They did this in around 200 lines of code.

Ignoring the fact that, in a more rational programming language, it would be less than 50 lines of code (if that), the real question is, how did they do so little with so much time? It's not that the kids were stupid. It's that the teacher doesn't know anything. The two teachers that have been assigned to teach computers science were both basic math teachers with no real training in software engineering. From what I've heard, they're terribly incompetent, and the only way to actually learn how to program is to learn form the textbook. Algorithms and computational complexity? Data structures more advanced than a primitive array? Those don't come until the "advanced" year-long AP computer science class, and they're only discussed, not implemented. The advanced course was cancelled this year and next year due to lack of demand. Last year, only a couple people did reasonably well on the AP, and these were people who spent class just learning by themselves out of the textbook.

Though it would be nice to use some more abstract language like Scheme or Haskell (or maybe even Factor), that's not really the issue. The issue is that the school doesn't really value computer science as a subject. If they did, there would be a computer science teacher and some thought would go into the curriculum (which is currently just Thinking in Java out of order, with lots and lots of repetition).

In the modern world, computer programming is very important for a large group of people to know. I think it is more important than physics or chemistry because it has a direct use behind it: creating software. I don't have any figures on it, but I'd be surprised if there were more physicists and chemists than there are software engineers. But the core curriculum for students in the United States is homogeneous, fixed and arbitrary: there's English and math (which are good), and then there's a rather nationalistic version of history, and a selection of sciences (physics, biology, chemistry) that is interesting but, for most students, very forgettable. In any case, sciences must be retaken from the beginning at the college level, so high school science study seems almost pointless. By contrast, computer science--like English and math--is often readily applicable if well-taught. I'm not saying it needs to be a mandatory subject, it should just have more attention.

But it's unlikely to get that attention anytime soon as long as it's considered just another elective. For now, I'm doing my best to help my CS-deprived learn more, helping them through SICP and explaining what, exactly, computational complexity is. This is something that is barely covered even in the AP class.

I talked with my principal today about the need for better computer science education in our school. She was nice about it and said she would look into getting more training courses for the computer science teacher, as I suggested. But honestly, I'm not sure if anything will happen.

Update: Hmm, maybe they will listen to me. Later that day, when I wrote this blog post, a different administrator came up to me and said (paraphrased), "The principal told me your ideas about to get classes for computer science through RIT, and I think that's good." So maybe something will happen, even if that's not really what I meant.

Monday, March 26, 2007

Dataflow in programming languages (with some Factor bias)

As I see it, there are three main ways to manage dataflow in a programming language. By this, I'm referring to how data passes both within a procedure. [The simplest one--visual boxes and arrows--is as easy to conceptualize as it is useless in practice, because it quickly gets unwieldy. So I'll ignore it for the purposes of this blog post.] Although it's often conflated in discussion, datflow pattern has nothing to do with other aspects of programming language paradigm. Data flow is often discussed in terms of functional vs. imperative, or applicative vs. concatenative, but this is a huge oversimplification. Advocates of one paradigm tend to claim that it is the best, for some reason or other, but in reality, all methods have applications where they excel and applications where they are pretty awkward.

I'll call the first dataflow method the mutation method. In this method, data is passed around within a function in named, mutable variables. Mutation is probably the oldest dataflow strategy because it directly corresponds to a register machine. So all early (50's and 60's) and most modern programming languages use it. The mutation method is probably what's most familiar to programmers: it's used in everything from C, Java, Basic, Ruby, Smalltalk and Python, to name a few. As an example, consider the definition of factorial in C:

int factorial(int x) {
int n;
for (n = 1; x>0; x--)
n *= x;
return n;

Update:Thanks to Dave for pointing out my extremely stupid errors in this code. It now works.
Though the Smalltalk version would look completely different, Smalltalk and C share the same fundamental property that the easiest way to pass around data in a relatively primitive loop is to mutate the value of variables in place. Self and Io may have some interesting semantics behind the getting and setting of variables, but in the end, it's all the same. The other two paradigms don't work like this. But not all programming languages using the mutation data flow are imperative. For example, Haskell, Lisp and ML can (all in different ways) emulate this sort of dataflow. In Common Lisp and Scheme, the value of any variable can be changed through setq and set!. In ML, mutable variables are implemented as an explicit reference. And in Haskell, there is the State monad and references through the I/O monad. I think it'd be possible to construct a purely functional language using this technique for everything.

The second way of doing things is what Lisp, Haskell and ML usually use: function application returning a value. I'll call this the application method. The first example I know of is John McCarthy's Lisp, created in 1958, a version of which is still in use today. Often, functional languages are defined in this way, but I think whether a language is functional or imperative has more to do with the use of data structures. In these languages, variables are (almost) always immutable, and loops are built up (at least in theory) through recursion. The main way a variable changes value is through function application: calling a function with different arguments "changes" the value of the variables. The same factorial function would be implemented in Haskell (without cheating and using any of Haskell's library or pattern matching) as

factorial n | n <= 0 = 1
| otherwise = n*factorial (n-1)
-- of course, it could more easily be written as
-- factorial n = product [1..n]

Recursion can, of course, be used in mutation dataflow languages, but in application dataflow method, it is usually much more efficient, so it can be used in practical applications. In application values, there tend to be fewer "statements" and more "expressions" (though statements still usually exist for definitions and module inclusion). Within application languages, there are of course differences. In some languages, mutable variables can actually be used (see above), and different forms of recursion are efficient in different application languages. But the overall concept is the same.

The third form of dataflow is point-free (also known as pointless by its detractors). It's definitely the newest kind of data flow at only around 30 years old. I'd estimate (and I'm basing this on absolutely nothing) that there are fewer than 1000 people writing code in this way in the world at any time. Point-free refers to the lack of points, or variables, in most or all code. Instead of variables, data is passed implicitly between functions. There are two ways to do point-free programming, which could be called stack-based and function-based. The progenitor of function-based programming is the late John Backus. It's sad his obituaries mostly said, "He invented Fortran, which is amazing. Then, he did other stuff that nobody cares about for 50 years." In his 1977 Turing Award address, Backus introduced a new programming language which he called FP. In FP, the only way to make programs is to compose functions with the function composition operator o. There are also operators used to apply one value to more than one function, or to map a function over all elements of a list. The problem is, this gets really awkward when you have three or more arguments to a function, or want to do certain other things that are more complex. But this style lives on as a sublanguage of Haskell and J. In Haskell's point-free style, factorial might be defined as

factorial = product . enumFromTo 1

Though, of course, this is cheating, as it uses two library functions defined in an application style.

As I already mentioned, the other style of point-free programming is stack-based, as used in Forth, Joy, Postscript, HP calculator language and Factor. Forth was developed by Chuck Moore in the early 1970s as the next generation language, though it never really caught on as such. Then, around 10 years ago, Manfred von Thun noticed the correspondence between FP and Forth, and created a new language called Joy which combines the strengths. In Forth and Joy, as in Factor, all dataflow (outside of some global variables and arrays) takes place on the stack. The stack can be manipulated with words (the equivalent of functions) like dup, swap and drop in ways that would be impossible in function-based languages like FP. There is more flexibility for more data to be passed around in different ways and treated granularly rather than as a unit. In Factor, factorial is (ignoring the library functions that would make it easier, and ignoring the fact that an iterative version would be more efficient)

: factorial ( n -- n! )
dup 0 <= [ drop 1 ]
[ dup 1- factorial * ] if ;
! Of course a better way of doing it would be
! : factorial ( n -- n! ) 1 [ 1+ * ] reduce ;

In some ways, this is very similar to the application definition, once you can understand the postfix syntax. But the semantics are very different: rather than applying a function to its arguments, you are passing around a single stack to various words. Nearly all words only modify the top couple arguments on the stack, though.

But there are finer distinctions to make. Though Joy and Factor are both stack-based, code comes out looking very different between them. This is because, in Joy's implementation, the data stack is just a linked list which is not mutated by words. In Factor, the data stack is implemented more efficiently, and it is mutated. Though Joy is slower, there are a lot of cool things that come out of this. For example, in Factor, it is a common pattern to dup one or more items on or near the top of the stack in order to evaluate the predicate for a conditional (eg if). In Factor, if takes three arguments. The first argument, found third-to-top on the stack, is a boolean value (Factor works like Common Lisp in that anything but f is treated as true). The second-to-top argument is the quotation to be executed if that value is not f, and the top argument is the quotation to be executed if the value is f. In Joy, the third-to-top argument is not a boolean but rather a quotation. This bottom quotation is executed, and a boolean value is taken from the top of the stack and saved somewhere. Then, the old stack from before that execution is restored, and one of the two following quotations is executed. Joy also uses recursion combinators, taking advantage of this property to factor out the abstract behavior of many recursion patterns. Though this is all nice to work with, it comes at the cost of efficiency, simplicity and ease of implementation.

As you may have realized at this point, point-free languages span the whole spectrum from relatively low-level imperative (Forth, at least at the most basic level) to pure or almost-pure functional (Joy and FP, also Haskell and J when used in this style) to impure functional (Factor, Postscript, Onyx). They can also be statically typed (Cat, StrongForth), untyped (Forth) or dynamically typed (basically everything else). Basically, they can be anything.

So, I'm not sure if this essay actually proved its thesis--that all dataflow strategies have strengths and weaknesses and that dataflow is irrelevant of paradigm --but I hope I've enlightened you a little bit as to what's out there.

Oh, and one more thing. I recently saw a blog post that pretty much claimed Haskell as the modern incarnation of FP. Although Factor doesn't completely hide the Von Neumann machine underneath, code doesn't need to care about it in any real sense. As far as dataflow goes, Factor is much closer to FP than Haskell is, as they are both about composing functions. Just to make it official and unambiguous, I claim Factor as the One True Heir to FP.

Sunday, March 25, 2007

Double dispatch in 24 lines

A friend referred me to this video where Peter Seibel extols the virtues of Lisp over Java at Google. One of the big advantages he cited was multimethods, which are much better than the visitor pattern as used in Java. Though the visitor pattern already comes out more flexible and concise in Factor than Java, I took on the challenge of creating an abstraction over those visitor patterns to avoid repeating code. I wanted something that made this sort of code work (this is a pretty stupid example, though):

VISITOR: ++ ! acts like +, coercing string arguments to a number, unless both arguments are strings, in which case it appends them

V: number string ++
string>number + ;
V: string number ++
>r string>number r> + ;
V: number number ++
+ ;
V: string string ++
append ;

It would compile it to a form sort of like this:

GENERIC: ++-string
GENERIC: ++-number
M: string ++ swap ++-string ;
M: number ++ swap ++-number ;

M: number ++-string
swap string>number + ;
M: string ++-number
swap >r string>number r> + ;
M: number ++-number
swap + ;
M: string ++-string
swap append ;

As it turned out, it wasn't much of a challenge, and the resulting code is about 24 lines long. The main problem with it is that the visitor pattern lends itself to a rather naive method order, but I don't think that's much of an issue in practice. Without further ado, here's the visitor pattern in Factor:

USING: kernel generic syntax words parser hashtables sequences ;
IN: visitor

: define-visitor ( word -- )
dup dup reset-word define-generic
H{ } clone "visitors" set-word-prop ;

CREATE define-visitor ; parsing

: connect-method ( top-class generic method-word -- )
[ swap ] swap add -rot define-method ;

: record-visitor ( top-class generic method-word -- )
swap "visitors" word-prop swapd set-hash ;

: new-vmethod ( method bottom-class top-class generic -- )
gensym dup define-generic
3dup connect-method
[ record-visitor ] keep
define-method ;

: define-visitor-method ( method bottom-class top-class generic -- )
>r >r >r \ swap add* r> r> r>
2dup "visitors" word-prop hash
[ nip define-method ] [ new-vmethod ] ?if ;

: V:
! syntax: V: bottom-class top-class generic body... ;
f set-word scan-word scan-word scan-word
parse-definition -roll define-visitor-method ; parsing

Enjoy! The full module can be found in the Factor repository. The great thing about Factor here is that, whereas you'd have to repeat this pattern every time you use it in Java, Factor lets you write it only once. But then again, comparing Factor to Java to prove Factor is good is somewhat of a straw-man fallacy.

Thursday, March 15, 2007


Warning: this is another one of those boring essays you won't want to read.

My mom always asks me, "if you know so much about computers, why don't you fix them?" I always explain, (1) I don't know that much about computers and (2) the interesting stuff is more theoretical, or at least stuff that's not directly used. So I tried to explain the great theoretical computer science problem of whether P=NP. I tried and I failed. So now I'm going to try my best to figure out how to explain this problem to the layperson, and my attempt is below. I think computer science like this is as useful to everyone as the basic chemistry and physics everyone learns in high school; even if you don't become a programmer, it's still good to know it.

Computers do things by executing a number of steps in order given a certain input. This is called an algorithm. One example of an algorithm is below, the algorithm to calculate factorial:

Step 0: Get the number that we are going to take factorial of, and store it as x.
Step 1: Set n equal to 1.
Step 2: If x=1, the answer is n. The algorithm is now finished.
Step 3: Multiply n by x and store the answer in n.
Step 4: Decrement x.
Step 5: Go back up to step 2.

Let's look at how this works. Try doing the factorial of 1. After going through Step 1, n=1 and x=1. Then, Step 2 makes the algorithm end, returning 1 as the result. If you use 3 as the value that we're taking the factorial of, the results of series of steps is as follows:

Step 0: x is now set to 3.
Step 1: n is now set to 1.
Step 2: This is skipped, because x does not equal 1.
Step 3: n is now set to 1*3=3.
Step 4: x is now set to 3-1=2.
Step 5: Go to step 2.
Step 2: This is skipped, because x does not equal 1.
Step 3: n is now set to 3*2=6.
Step 4: x is now set to 2-1=1.
Step 5: Go to step 2.
Step 2: x=1 so the program stops. n, which now equals 6, is the answer.

Obviously, this can get tiring quickly, which is why we have computers. But this algorithm will always return the right answer for a factorial for a number 1 or greater.

One thing that you might notice is that we can calculate the number of steps this will take: for any input x>=1, it takes 4x-1 steps. Since 4x-1 is a polynomial equation, we can say that this algorithm runs in polynomial time*. There are lots of different operations that computers do that run in polynomial time. In fact, almost everything is run in a number of steps that can be represented with a polynomial equation. The things that aren't run really slowly. The set of all algorithms that run in polynomial time is written as P, which, of course, stands for polynomial.

For the factorial algorithm, the two variables (n and x) only had one value at a time. But what if we said that a variable can hold more than one value at a time, and we would simultaneously see how things go for all the values? This doesn't make all that much sense, but let's look at a possible algorithm using this technique, an algorithm that factors an integer. This is key in cryptography, where the RSA encryption algorithm depends on this being difficult to do in a regular computer, where all variables only have one value at a time.

Step 0: Get the number you want to factor and store it in x.
Step 1: Set n to some integer between 2 and the square root of x.
Step 2: If x is evenly divisible by n, then n is a factor.

So this can find a factor of a number in just 3 steps. It's much faster than the regular way of doing it with a regular computer. This weird computer that can hold more than one value in a single variable (as done in Step 1) is called a non-deterministic computer, since it's not deterministic what value the variable will hold at any given time. It might be possible to do this algorithm in polynomial time on a regular computer, but computer scientists still don't know how.

Now, if you understand that, you're over the hardest part. The set of algorithms that can be done by this weird computer in polynomial time is called NP, which stands for non-deterministic polynomial. The question is, is there anything that we can do on the non-determistic computer in polynomial time that we can't do on the regular computer in polynomial time? Or is the set of algorithms on a regular computer in polynomial time the same as the set of algorithms on a non-deterministic computer in polynomial time? That is, does P=NP?

* Those of you know know about this stuff will tell me, "No, factorial is actually pseudo-polynomial. You're wrong, Dan. Go home." Well, you're right, but it's just for the sake of explanation. For the acute reader who wants to be confused, here's a more in-depth explanation: computers store things in bits of ones and zeros, and a number takes ceiling(log base 2(x)) bits to hold a number. For example, 7 takes 3 bits to hold in a computer. A real polynomial-time formula actually would be able to do its thing in a number of steps that can be represented by a polynomial formula of the size (for example, in bits) of the input. The factorial formula's number of steps is proportional to somewhere around 2^s, where s is the size of the number in bits. But it was just an example.

PS. If you have trouble understanding this, please leave me a comment.

Friday, March 2, 2007

Computers can't program themselves!

Note to coders: If you've done much in programming, you'll probably find this blog entry completely pointless and boring. You'd probably be well not reading it. Also, you may have noticed that my blog entries are way too long. This is no exception.

In 11th grade, I had an excellent economics teacher. He really got me interested in the topic, and helped me take the AP even though it wasn't an AP course. But the one thing that always annoyed me was the claim that, "By 2023, computing power in the world will exceed human brain power. This means that computers will program themselves!" I would always try to tell him, no, that's completely implausible, but could never convince him. When I mentioned this to a friend, he said, "They'll program themselves, but not until 2050." This view is far to common for me to resist refuting it here.

Computers programming themselves is at once impossible and already happening. It's already happening in the sense of compilers and libraries. Because of compilers, programmers don't write in ones and zeros that computers understand; they write in a language that describes things more directly, and then the computer "programs itself," converting that programmer-intelligible code into computer code. Similarly, libraries (reusable source code components) let programmers write drastically less. It's gotten so advanced that, to a large degree, programming is assembling preexisting components.

But it can't get much further than this. Text files for source code may disappear (I doubt it), but the programmer never will. An increase in computing power won't do it. All computers are basically Turing-complete*, so they can already perform every operation that any other computer could possibly perform. This may be at a slower rate, but the same amount of things are possible. No physical device can do more than what a Turing machine (or an equivalent machine, like the computer you're reading this on) can do. Even if, in 2023 or 2050, there is a trillion times more computing power than there is today, no new computations will be available; they'll just be faster. So if it will ever be possible for computers to program themselves any more than they already do, it's already possible. Someone just has to program it.

But advances in artificial intelligence or similar technologies won't make computers program themselves either. No matter what happens, computers still have to be told what to do. What they are told to do could be some complex behavior, such as behaving like a neural net (this isn't as useful as it sounds) or find rules and make decisions based on statistical analysis of a situation and past responses (this is a bit more useful). But the most useful operations have nothing to do with AI or learning. These include things like sorting a list, multiplying two matrices, and displaying a GUI window. These can only be done by a algorithmic process: directly or indirectly, someone has to tell the computer which element in the list to compare to which, what matrix elements multiply with what, or where to place the pixels in the window on the screen. Furthermore, effective programmers need to understand these algorithms, at least a little bit, in order to combine them in an efficient way. There is no way to avoid this eventuality. The best we can do is build detailed, useful libraries, so code only has to be written once, and make a good, high-level programming language so programmers can express their ideas in a simple, logical way.

This isn't to say that programming will always stay the way it is now. Programming in modern languages like Java, Haskell or Factor today is radically different from programming in C or assembler 25 years ago**, and from punch cards 25 years before that. In 25 years, things will be very different, possibly even with new input devices (such as graphical programming or voice commands; these have problems too, but that's another blog post). But computers won't ever program themselves as long as they exist; there will always be someone who has to directly tell them what to do.

As an author, I'm not very persuasive. So to anyone I haven't convinced, I will extend a bet: for any amount, in any finite time frame, I bet you that computers will not program themselves.

* Actually, only computers with an infinite supply of memory are truly Turing-complete, because there is an infinite number of possible algorithms. But in practice, computers can do pretty much whatever we want them to do, and this ability will not change significantly with time.

** Those languages are still used today, but not usually used as the primary language for constructing an application.