Thursday, July 16, 2009

A simple interprocedural optimization

An important feature of a compiler is that it compile code fast enough so that you don't feel like you're waiting forever. For this reason, most optimizations stop at the boundary between words.

If one word calls another word, then the callee, in general, doesn't get the benefit of any information collected about what its arguments look like. And the caller doesn't get any information about what might be returned. There are exceptions to this, for specific words that have special optimization behavior in the compiler. For example, class predicates interact specially with the propagation pass to fold to constants whenever the compiler can prove that they'll always evaluate to true or false.

One optimization the compiler works hard to do is eliminating type checks and runtime generic dispatch. It likes to turn virtual method calls into direct jumps, both because this is faster and because it enables further optimizations. Type inference in the SCCP pass is what drives the elimination of dispatch.

But type inference has to stop at procedure boundaries, in general. We can't know all of the possible inputs to a word, since it can be called from anywhere, including the listener. And it would take too much time for callers to trace through every procedure they call to see what they can deduce about the output from what they know about the input.

On the other hand, there would sometimes be a lot of benefit for callers and callees interact to perform optimizations. It'd be especially helpful for things words like append which would benefit the most from type inference. append consists of many generic calls (to length and nth-unsafe), and the dispatch can be eliminated if the types of the inputs are known. Additionally, the type of the output follows from the types of the inputs, and

Maybe interprocedural analysis is too much work in general, but for something like append, it would be helpful to have versions specialized for several different types, which are used when the type of the inputs is known. I implemented in Factor a simple system where words can be annotated to do this. The code is in the Factor pastebin; this is just a prototype and needs some changes before it's fully read to use.

With this system, to make the word append automatically create specialized versions based on the types of its two inputs, you can use the declaration
SPECIALIZED: append 2

This doesn't immediately compile a ton of different versions of the word. Instead, it compiles them "on demand", whenever the propagation pass finds that append is used with certain types.

When I applied this to the nbody benchmark, part of the Programming Language Shootout, by making certain words in math.vectors, the running time went from around 4.3 seconds to around 4.0 seconds. This is a modest gain, but it's on top of something already highly optimized--there is some code which gives the compiler special knowledge of how to run vector operations on the kind of array used in the benchmark. I hope that this technique can make most of that code go away.

No comments: