Saturday, October 10, 2009

Bitfields in Factor structs and the special style

Sometimes, it's useful to pack multiple small numbers into a small amount of memory. Say you have a tuple like
TUPLE: nums a b c ;
where you know that a is always between 0 and 31, b is always between -2 and 1, and c is always either 0 or 1. It should be possible to store this in a single byte, since a could be represented in 5 bits, b can be represented in 2 bits, and c can be represented in 1 bit.

One way to implement this is to define words to store in a single fixnum what a nums tuple stores. These words would consist of shifts, masks and maybe calls to a sign extension routine. But this code would be boring and annoying to write.

I've automated this process by allowing Factor structs to have bit fields, like C structs can. Here's how the implementation of nums would look:
STRUCT: nums
{ a uint bits: 5 }
{ b int bits: 2 }
{ c uint bits: 1 } ;
Thanks to Joe Groff's changes on how structs work, this is a class, with prettyprinting and accessors just like tuple classes. Structs were originally a feature of Factor for the foreign function interface, but they're actually useful in pure Factor programs too: they're an efficient way to manipulate binary data.

Bitfields deviate from the FFI origins of structs because they don't follow any standard convention on bitfield layout. In C, unlike for structs, there is no single standard ABI for bitfields, even on a single OS/architecture platform. Different compilers act differently. So why not make up my own convention? I store everything little-endian, and the first bitfield stores in the least significant bits of the byte. Because bitfield access only reads the underlying memory one byte at a time in the current implementation, this is just as efficient on big-endian hardware as little-endian hardware.

Factor supports efficient homogeneous arrays of structs, allowing lots of data to be packed together efficiently. Because I extended structs for bitfields, rather than creating a new type of class, struct arrays can immediately be used with structs that have bitfields. This worked immediately; I didn't have to modify struct arrays.

The actual code for the setters and getters is implemented in a high-level style. There are no type declarations, and all math is done with generic arithmetic calls. Factor's compiler is smart enough to translate this into fixnum arithmetic with no overflow checks in the case that the bitfields are small enough. If you make a field 100 bits wide, it'll use generic integer arithmetic, to take into account possible overflow into bignums. But this will only be used for certain calculations, the ones that could overflow.

The Factor code generated for accessors is naive, not only in how it doesn't declare types, but it also does some things that could be special-cased for more efficient code. For example, in every byte read of the array, a mask is applied to the result with bitwise and, so that the relevant bits can be extracted. Sometimes, that mask will be 255, so it won't actually mask away any bits, and does nothing. Rather than special-casing a mask of 255 and generating code that doesn't call bitand, I extended the compiler to understand the algebraic identity that, in the range of numbers from 0 to 255, 255 bitand is the identity function. (This works for (2^n)-1, for any n.)

Not all code can be compiled as efficiently for the compiler as the bitfield accessors. There is a special style of code that the compiler can compile more efficiently. As the compiler becomes more advanced, the subset grows. Really, there's a continuum of programs that the compiler can compile more or less efficiently. Originally, the special style consisted of code whose stack effect could be inferred by the compiler, allowing optimizations to take place. Now, all valid Factor code has an inferrable stack effect, but the compiler is advanced enough that it can do further optimizations when more information is available.

Code written in the special style has to have something indicating all of this information about the code. In the case of bitfield accessors, we want the compiler to be able to infer that it can use fixnum arithmetic without overflow checks if the bitfield is small enough that the result won't ever overflow a bignum. The compiler can figure this out in the case of bitfield getters because it is only doing shifts, bitands and bitors on the result of alien-unsigned-1, a word to read one byte of memory. The shifts are all constants.

In the sparse conditional constant propagation pass, the compiler tracks the possible interval of values that a virtual register could hold. The compiler understands how bitands, bitors and shifts by a literal relate the interval of their output to the interval or their input. Another pass, the modular arithmetic pass, propagates information backwards about the modulus that earlier calculations could be done in, based on how the data is used later. This also allows overflow checks to be eliminated.

This aspect of the special style allows overflow checks to be eliminated, allowing the use of more direct machine arithmetic. Other aspects of special style allow other efficient code to be generated. On code where types can infer, and a float or SIMD type is used, the compiler can allocate float or SIMD registers, and inline efficient code for arithmetic operations. Code using immutable tuples get the benefit of tuple unboxing, so repeated lookup of slots is cheap, and sometimes allocation can be eliminated. Code that uses mutable datastructures gets unboxed at another point in the compiler, but since it is harder to reason about, unboxing right now has only been implemented within a restricted piece of code. When the class of the input to a generic word is known, the correct method is called directly, and sometimes inlined. There are many other optimizations that have been implemented over the years, too many to describe in this blog post.

Not all code could be written in the special style, and it would be unreasonable to expect most Factor programmers to learn about it. The compiler and the UI framework, for example, would be difficult to translate into a form taking heavy advantage of style. But a lot of the code in the standard library could be written this way. For example, the code for appending sequences is written in a style that allows the inner loop to have no subroutine calls inside of it and register allocation to be performed, when operating on certain types of sequences. There is only one piece of code that implements append, but using the hints feature lets specialized versions be generated for certain types which are often appended together. Append is used in many places, so code that's written in a dynamic style will benefit from this, even if the dynamic style code isn't optimized better.

Special style is an extremely important feature of Factor, and without it, Factor would be as slow as many scripting languages. Without special style, many libraries would have to be implemented in C, as scripting languages do. Because of special style, Factor can be self-hosting and use many libraries written in Factor without an overwhelming performance penalty. Without special style, to implement bitfields as fast as they are right now, it would have been necessary to generate machine code on each platform.

Update: I guess I forgot to tell you exactly what special style is, and how to write code in it. Well, it's complicated. I'll write about it in the future. Special style grows as the compiler becomes more advanced, but I can't describe how it is right now.