Sunday, July 22, 2007

ICFP results

A bunch of people have asked me how the ICFP programming contest is going, so I thought I'd just write a blog post about how it all went. Just to give you a spoiler, in case you don't want to read the whole thing: we lost.

But if you're going to keep reading, let me describe the task to you. It was described in a 22 page document, so if you weren't in the contest, chances are you didn't read the whole thing. The premise is, an alien crash-landed on earth, and we have to prefix his DNA with some more DNA in order to help him survive in our conditions. His DNA works differently than ours, consisting of the letters I, C, F and P. It is converted into alien RNA through a slow and potentially non-halting algorithmic process. Then, the RNA (in chunks of 7 bases) is executed, with each chunk interpreted as one of 20 commands, or ignored. These commands are interpreted to produce a 600x600 picture. The specification for the DNA>RNA conversion and RNA interpretation is really complicated; I remember falling asleep at 2:00 AM on Friday (Saturday?) reading it. The default DNA produces this, and the goal of the contest is to come up with a prefix for that which will produce this, or at least a good approximation of it, in as little bases as possible.

So, obviously, Gavin (gmh33) and I thought, the first step is to produce a system which converts DNA into the image. By the end of the second day, we were mostly done. Some teams got this done in the middle of the first day, but (a) we're not perfect programmers and (b) we still had lives going on, so it took us a bit longer. Gavin wrote most of the DNA>RNA converter, and I wrote the RNA interpreter with lots of help from dmpk2k from #concatenative. For me, it was a nice opportunity to learn about basic graphics algorithms like flood fill, alpha blending and line drawing. (I might put these in a coherent, efficient library some time later, but right now, they're pretty incoherent and inefficient.) For Gavin, it was a good chance to learn about Factor programming and all of its oddities and idioms.

I asked Gavin to put the alien's DNA into his code to convert it to RNA, so I could then render it into a picture, hopefully the source picture. Eight hours later, it was still going. Eventually, Gavin figured it out. If I'd been working alone, I'd probably never realize it. "I kind of have a feeling we approached this the wrong way... I think maybe going backwards would have been the right way." Indeed, according to the spec itself, the DNA goes through more than a million iterations before fully transforming into RNA. So, most likely, the DNA>RNA conversion wouldn't've been done by the time the contest was over!

Our new strategy was simple: I'd write the code to convert the target picture into corresponding RNA instructions, and then Gavin would convert that into a DNA prefix. Within a couple hours, Gavin had his code working--it only needs to add three I's before each RNA instruction, and then skip all the following DNA code. But mine was much harder. I wrote code for moving the cursor around, setting the bucket color to any RGB value, and drawing lines. I thought there was just one more step: code to look at the target picture and figure out where lines should be drawn so I can flood fill the solid patches of color. Dmpk2k (a 2D graphics genius with infinite tolerance for people who don't understand anything) and I discussed the possible algorithms for this for over an hour, until we both simultaneously realized, in horror, that there were no solid regions of color!

At this point, Gavin and I decided to give up. In total, we'd written about 500 lines of code, and it all worked, though it didn't solve the problem. So we weren't going to just not submit anything--we were going to lose in style. Gavin came up with the idea of drawing a big green F for Factor, and before long I sent him the RNA, which he converted to DNA and submitted. Surprisingly, we didn't come in last place. Maybe it was the white background, but we apparently got a few pixels right, somehow. At the end of the contest, team Raptors is in position 354, or fourth to last of those who submitted something.

All in all, a horrible failure which I'll be sure to take part in next year. For both Gavin and I, it was a very humbling experience. It'll be a little while until I'm a good enough programmer, but eventually the world will know that Factor is the programming of choice for discriminating hackers!

Update: I should have mentioned this in the original post. After doing everything, I realized that our approach was far too simplistic. A good solution would not have taken the segmented approach we did, separating the image to RNA processing from the RNA to prefix processing from reading the DNA. To have a good solution, we would have had to do all of those things at once. We would have had to analyze the DNA to see how it related to the picture, and then produce DNA which, over several iterations, would make the correct RNA to generate the picture. I have no idea how to do this. To test things, we'd need a DNA interpreter that was fast, and for that, we'd need a highly optimized, well-thought-out implementation in a fast, low-level language, like C. It might be possible in a higher level functional language, but only if many low-level features are taken advantage of.

Update 2: After reading a bunch of other post-mortems, I realize that, in fact, the organizers of the ICFP made a whole library and programming system in DNA, which turns out to be a Turing-complete (though not terribly efficient) programming language. So you get a series of puzzles, which lead to more puzzles, and so on. Oh, and to represent DNA efficiently, you can't use plain old strings, you need an obscure functional data structure developed at Xerox PARC called a rope (like heavyweight strings, get it?). It's a specialized tree of characters that do concatenation in constant time and most other operations (indexing, substring, etc) in logarithmic time. For searching though a string, the Knuth-Morris-Pratt (KMP) algorithm is needed, which detects and takes advantage of patterns to get the complexity down from O(mn) (where m is the length of the pattern and n is the length of the string) to O(m+n). The DNA contains a lot of repetitions, and the conversion to RNA uses a lot of searching, so this is necessary.

Since someone asked, I uploaded Gavin's and my code to onigirihouse (thanks Matt!) for all to see the horrors. The icfp.factor may be a little outdated, though, and it doesn't contain the code used to convert RNA into DNA. We just communicated our code by email, and that wasn't always the most efficient or effective way to do things. The code isn't terrible, but it's really inefficient and there are bugs in it. Also, I had some messy stack handling in a few parts of image.factor because I didn't think things through very well.

3 comments:

cashto said...

You know, the self-check string that they give at the bottom of the contest materials? Turns out it had a higher score than "dark Endo". A lot of people got 319th place or higher with a score of 3583008 with this 28-byte string.

Daniel Ehrenberg said...

Yeah, I know. We considered submitting that string, but decided it'd be cooler if we submitted something of our own. We weren't going to win, so why not show off our DNA prowess by (not really) by making our own string? I bet some of those people that placed with the 28-byte string didn't even understand how DNA and RNA worked to produce an image. Not to say that we were far ahead...

Chris Smith said...

We ended up doing something similar. We threw away the DNA -> RNA converter, and wrote a prefix that discarded the original DNA and built a new image. The only thing different was that I write a parser for a language with drawing commands, which outputs RNA; and I write a decompressor in DNA, so that we could pick the 16 RNA commands that we liked most, and encode the RNA in two letters per command and then just add the decompressor prefix. We submitted the night-to-day prefix and tied with everyone on that, though; our grand scheme of starting from scratch never scored nearly so well. By the time we gave up, we'd gotten 15% of the pixels right.