Friday, June 15, 2007

Matrices and packed tuple arrays

In Factor, it's possible to make homogeneous arrays of unboxed tuples pretty easily and efficiently. The advantage of this is space. On the tuple TUPLE: point x y z ; where the delegate is known to always be f, a tuple array yields a 3:1 space savings. With a smaller tuple, the savings could be even more. Additionally, fewer long-lived objects are created, which could help GC performance.

OK, some background. In Factor, when you want your own type/record/"class" (for some of your classes, not all), you make a tuple. A tuple is basically an array with a class tag as the first element, a delegate as the second element, and record fields in the rest of it. You can define a tuple, as previously mentioned, like TUPLE: point x y z ;. That makes a tuple class called point with three fields, point-x, point-y and point-z. Those words can be used to access the fields, and prepending them with set- sets the fields destructively. The word <point> is defined (by default) to create a point object, given an x, y and z. For object orientation purposes, it's possible to define methods on the point class, and the word point? tests if an object is a point. Additionally, all tuples have a delegate, but this is often just f, which does nothing. Delegates are another post, though...

So, in the point tuple, there's a lot of metadata that each instance needs to hold. First, it needs an object header, which says that it's a tuple with 5 slots. The first slot always contains the word point. The second slot, since no delegate is needed, will always contain f. After that, there's the three numbers. If you have an array of 1000 points, there's no real reason why each point should store all this information; it's always the same! So, why not have an array which holds a bunch of x, y, z instead?

The naive approach would be to have an array of arrays. Each array within the array would have length 3 and contain x, y and z. To get the point out of the array, you just index the main array and convert the inner 3-array into a point tuple. This approach would work fine, but it would still hold unnecessary extra data. For one, there is still some metadata remaining--you still have all these arrays, whose length of three must be stored. Why bother? What if, instead, you stored an array of { x y z x y z ... } and the tuples could simply be implied in the position?

There are two aspects of this. The first aspect is creating an implied structure over a flat array--the interpretation of { x y z x y z ... } as { { x y z } { x y z } ... }. This is provided by the groups class, defined in the splitting vocab in the core. The second step is interpreting { x y z } (the individual inner arrays) as a point. [Note: this involves a sort of lazy virtual map, which could be generalized as a virtual sequence (which wouldn't compile yet). This might be something interesting to investigate in the future.] So there will be two virtual sequences: a matrix (like groups, but with using slices for efficiency and set-nth to allow mutation) and a tuple map, which interprets the matrix slices as tuples. The latter will delegate to the former.

The implementation of all of this is fairly straightforward and simple; it's around 60 lines of code. In Factor, there was no special introspection going on with respect to tuples. There were just a few things I needed: in the classes vocabulary, the word class gives the class of any object, including tuples. In the tuples vocabulary, the word >tuple converts any sequence into a tuple, using the first element of the sequence as the class, the second element as the delegate, and the rest as the slots. Note that, because we're using >tuple to create the tuple, the normal constructor is never called. In the same vocab, the word tuple>array converts a tuple into an array, with the structure previously described.

As I mentioned before, if there is no delegate (that is, if the delegate is known to be consistently f) then there's no reason to store it. So when making a packed tuple array, the delegate should be skipped only if there's no delegate. This means there are three things that need to be known about the tuple: the tuple class, length, and whether a delegate needs to be stored. The length is stored in the underlying matrix, but the the other information must be stored somewhere in the packed array itself. I decided to store them together in what I called an example tuple. Basically, the example tuple is a tuple of the sort that the tuple array will store. It is an instance of the matching class, and it has a delegate if the delegate should be stored. For the point tuple, an example could be T{ point }, which is equivalent to f f f <point>.

One important difference to notice the mutation semantics. By mutation I mean changing a tuple in place. If you index the tuple array and then mutate the resulting tuple, the mutation will not be reflected in the array unless it is explicitly written back with set-nth. This means that tuple arrays are not completely equivalent to regular old arrays which happen to all have the same kind of tuple in them. Fortunately, this isn't much of a problem in practice. Factor is an impure functional language, so oftentimes, tuples aren't mutated after they're created. Of course, if you mutate something that's within a tuple slot, that mutation will be propagated, since that object actually exists in the array.

So, to sum it up, it's really cool that Factor can do this so easily. My explanation might have been confusing, but the code itself is pretty simple. Additionally, both the matrix and the homogeneous tuple array follow the sequence protocol, so immediately, all sequence combinators (like map) can be used on them. One major caveat is that this is a little slower than a regular array: on my computer, it takes around 40 microseconds longer to index than a normal array to index, or about three times as long as it takes to allocate the point tuple itself. One reason for this may be redundant bounds checking in the creation of slices in the matrix.

But I hope this library useful. If you've seen anything like this in any other programming language, I'd be really interested to hear about it. It could be possible in Common Lisp with vector-backed structs, but then you can only use those structs. In scheme, Scheme, you could always make your own record system, as in Lisp. And in C/C++, it's sorta builtin; just make an array of structs (rather than pointers to structs). But otherwise, how could you make this in another language?

The code is in my repository in tuple-arrays.factor. It can be loaded with the line USE: tuple-arrays.

Update: I previously wrote extra/2d to have the two dimensional arrays, but Slava later extended groups to have all of this functionality (though it's not perfect).

No comments: