Tuesday, July 14, 2009

Some new compiler optimizations

I've been working on Factor's optimizing compiler, adding a few new simple optimizations. I've made call( and execute( do more inlining, extended dead code elimination, increased the number of cases where overflow checks can be eliminated, and made object instantiation fast in more cases. Here, I'll explain what the optimizations are and how they're implemented.

Inlining with call( and execute(

call( -- ) and execute( -- ) are words which let you call quotations or execute words. Slava explained them in a blog post. They differ from call and execute in that they don't require that the word or quotation is available through combinator inlining. But they require an explicit stack effect to be given, to ensure that it takes and returns the right number of parameters. This is nice, because the versions with a stack effect have an additional safety property: they'll only run with if the code has the right stack effect.

Until now, these combinators carried with them a perfomance penalty over using call or execute with known quotations. The penalty is that the stack effect of the quotation must be checked, at runtime, to match the stack effect of the callsite. With an optimization I implemented, the performance is the same when the quotation is known. With matching performance in this case (they're both completely free in the case where either would work), it should be easier to write code that uses the checked versions.

For example, the following implementation of an absolute value function compiles down to the same code that you'd write with the normal if combinator.

:: iff ( x cond true-quot false-quot -- x' )
cond
[ x true-quot call( x -- x' ) ]
[ x false-quot call( x -- x' ) ] if ; inline
: abs ( n -- abs(n) )
dup 0 < [ neg ] [ ] iff ;


This is implemented as part of the sparse conditional constant propagation (SCCP) pass in Factor's compiler. call-effect and execute-effect have custom inlining behavior there, which takes advantage of information collected by the propagation pass. If enough is known about the quotation at this time (if it is a literal quotation, or a literal quotation with something (literal or non-literal) curried on to it, or two literal quotations composed together) so that the stack effect can be inferred and the code can be inlined, then it is inlined.

I could have implemented this as a transform in the stack checker, but this strategy gives a stronger optimization, since it can interact with everything in constant propagation. For example, it interacts with method inlining. This will help improve performance in the Factor Smalltalk implementation, where previously combinator inlining would have been impossible without special support from the Smalltalk-to-Factor translator.

Eliminating overflow checks

In Factor, unlike C and Java, calculations on integers never overflow. Instead, numbers that are too big are converted to a representation that scales to arbitrary size. The smaller integers which are faster to calculate on are called "fixnums" and the larger ones, which are slower to use, are called "bignums"

Factor's compiler does a lot of work to convert general arithmetic to arithmetic on fixnums when possible. One thing the compiler does is try to infer that integers will be in a small enough range that no overflow can happen. This is part of the SCCP pass.

Another compiler pass checks if a value is only used in places where overflowing doesn't matter. For example, if the code + >fixnum is run in a context where it's known that its arguments will both be fixnums, then it doesn't matter if an overflow check is done because of how modular arithmetic works.

I extended this pass to take into account a couple other words that have this property that >fixnum has. Specifically, the words which set memory at a given pointer address, like set-alien-unsigned-1 and take integers as arguments. Depending on the word size of the platform (32 vs 64 bits), this optimization is valid for a different set of words.

One time that this comes up is in v+n when called with a byte array and a fixnum. Adding two fixnums gives back something that can be either a fixnum or a bignum, but the form of addition without an overflow check can be used since the result is going to be stored back into a byte array. Storing into a byte array is implemented with set-alien-unsigned-1, so the optimization applies.

Inlining for new

The word new instantiates a tuple with default values for all slots, given a tuple class. By default, this is done dynamically: the tuple class does not need to be known before runtime. But usually it is known ahead of time. If it is known ahead of time, then code can be inlined which instantiates the specific tuple that is there.

This was previously implemented as part of the stack checker. I moved it to the propagation pass, which makes the optimization stronger. I plan on moving more transforms like this (for example, the one for member?) to the propagation pass.

[Update: I did this, adding a utility called define-partial-eval. Its interface is identical to define-transform, but it operates on the propagation pass IR. Transformations which don't need to interact with stack checking should use define-partial-eval rather than define-transform, since it creates a stronger optimization.]

Better dead code elimination

I extended dead code elimination in the low-level IR to be more accurate. Now, it realizes that an allocation is dead if it is only written to, and never read from. With this, together with alias analysis, the code 2array first2 compiles into a no-op, and no allocation takes place. (This isn't optimized out by tuple unboxing in the high-level IR, because arrays are mutable.)

2 comments:

Zeev said...

For the lay person, what is the percent improvement on language shootout style benchmarks? Is Factor beating SBCL yet?

Daniel Ehrenberg said...

Zeev,

No, these are just small steps that make only certain code run faster. I think they're worth it, though. Most of these optimizations will make more code applicable to the low-level optimizations. (In Factor compiler insider-speak, they make more things "infer".)

There are some other things in the works which will have a bigger impact; you'll be hearing about them from Slava and me as soon as we implement them. This was just my first crack at getting a little bit done in the compiler.