Sunday, July 15, 2007

Messing around at the Factor REPL

Read Eval Print Loops, or REPLs, are really useful, I've found. One of my favorite uses, beyond prototyping, is playing around and solving problems. Often these are math problems, but for slightly more complicated things, I need string processing. Simple string processing things like this also make an easy example for explaining Factor to beginners like you. (If you're not a Factor beginner, you may want to stop reading.)

My younger brother is learning how to type right now. My dad found Tux Typing, a GPL typing program for Windows and Linux. So far, it's great; there's just one problem: when you use the word mode (where you have to type words before they fall to the bottom of the screen, creating horrible crashing sounds) there are only about 35 words in the long word setting. My dad asked me to fix this, and since it was a simple little task, I agreed.

I started by figuring out the file format for the dictionaries. It was easy: the first line was the title (I chose "Huge words") and after that, the words were listed in allcaps, separated by Unix line endings. Next, I copied the text of the Wikipedia entry Economy of the United States into Wordpad and saved it to the desktop. Then I downloaded Factor to the computer I was working on and fired up the REPL.

The first thing in making the word list is getting a list of space-separated things. So I made a file reader object and got an array of all the lines. I joined these lines with a space, then split everything separated by a space (separating both lines and words on the same line).
"article.txt" <file-reader> lines " " join " " split

Now an array of words is lying on top of the stack. There are a bunch of operations we need to do to manipulate this, and Factor's sequence combinators help make it easier. So I made sure that each word had at least three letters in it:
[ length 3 >= ] subset

And I put all the words in upper case:
[ >upper ] map

And I made sure that each character of each word was an upper case letter, to filter out apostrophes, numbers, and other similar things:
[ [ LETTER? ] all? ] subset

And finally, I made sure there were no duplicates:

So, to join this together in the correct file format and add a title, I used
"Huge words" add* "\n" join

yielding a string. To write this string back to the original file, all I needed was
"article.txt" <file-writer> [ print ] with-stream

And that's it! All in all, 10-15 minutes work and I got the game working with a good word list. But the REPL made it a lot easier.

Update: I didn't have the <file-reader> and <file-writer> displayed properly before, but it's fixed now. Thanks Sam!

1 comment:

Sam said...

Looks like <file-reader> and <file-writer> aren't properly displayed, as you forgot to escape < and > in your blog post.