Friday, October 26, 2007

Factor's object system

In my FAQ, I wrote that Factor was object-oriented. That might have confused some of you: how is Factor object-oriented when code isn't organized in terms of classes like in Java? When there is no system for message passing*? When tuples can't inherit from each other?


Originally, Factor didn't have an object system. It was said that it's not necessary; we can get by with just words, conditionals and lists. But as the library grew more complicated, it became clear that we needed a cleaner method of dispatch. This took the form of generic words, or words that can be overloaded by the class of one of their arguments. Initially, we only had predicate classes, and generic words were in fact no more than syntactic sugar for cond. But we also needed a good way to define data types, so that they wouldn't be confused with each other. Later, we realized the need for a system of 'inheritance' from tuples to tuples, flexible constructors, extensible unions and other things. Piece by piece, features were added to form a emerging coherent whole which could be used to organize some Factor code.

It's important to note that most words are not methods and not all data is stored in tuples. Groups of Factor code are put in vocabularies, not in classes. It's common to write a lot of code without using any object-oriented features (aside from generic words defined in libraries). Nevertheless, object orientation, when used appropriately, makes code much better organized.

The class hierarchy

In Factor, everything is an object. Well, of course everything is an object; everything is something! But I mean everything is an instance of the class object. Under that class, there are a number of built-in classes, representing basic things like tuple, vector, array, fixnum, float, bignum, ratio, quotation, curry, etc. You can test if an object is in a particular class with words of the scheme object?. To test if an object is a tuple, use the word tuple?.

On top of these builtin classes, there are a number of abstractions. For example, the class number is the union of all numeric classes. Under this is a numerical tower of subclasses, similar to Scheme. An example of a couple unions of this kind:

UNION: integer fixnum bignum ;
UNION: rational ratio integer;

Factor also supports extensible unions, called mixins. Whereas a union is closed, later code is allowed to add things to a mixin. The above code can be translated into mixin syntax as:

MIXIN: rational
MIXIN: integer
INSTANCE: rational ratio
INSTANCE: rational integer
INSTANCE: integer fixnum
INSTANCE: integer bignum

This is useful for things like sequences and assocs, which form mixin classes. Unions are inappropriate here, as new sequences and assocs can be made in the library.

Going in the other direction, classes can be narrowed down using predicate classes. A predicate class is a subclass of some other class consisting of instances which satisfy a conditional. For example, the natural numbers (defined as integers 0 or greater) could be defined as

PREDICATE: integer natural 0 >= ;

Generic words

So what use are all these classes? We can use predicates on them to test membership, but that's not all that useful. The centerpiece of the object system is generic words, or words which dispatch on the class of their arguments. Typically, only single dispatch is needed, so multiple dispatch is not in the core**. A generic word which dispatches on the top of the stack is defined with the parsing word GENERIC:. Methods, or implementations of the generic word, are written with the parsing word M:.Take the following useless example:

GENERIC: process ( thing -- processed )
M: string process
"foo" append ;
M: integer process
5 + ;

Here, there is a generic word process with two methods: one on strings (which appends "foo") and one on integers (which adds 5). This is equivalent to:

: process ( thing -- processed )
{ [ dup string? ] [ "foo" append ] }
{ [ dup integer? ] [ 5 + ] }
} cond ;

But the first version of process has the advantage that it is extensible later, in other files. Additionally, generic words are sensitive to the class hierarchy and will test things in an appropriate order, and dispatch is more optimized than with a simple cond expression.

There are other ways to dispatch besides GENERIC:. Other method combinations include MATH:, for words which dispatch on builtin math types for two arguments, HOOK:, to dispatch on the class of the value of a variable, and GENERIC#, which allows dispatch on items further down on the stack. You can also make your own method combinations.

Tuples, delegation and constructors

It's possible to construct your own data types using just arrays and predicate classes, but that can get awkward and ambiguous. To make new data structures, Factor offers a tuples, basically another name for structs or records. Here is an example of a tuple class called foo, with slots bar and baz, together with a constructor called <foo>:

TUPLE: foo bar baz ;
C: <foo> foo

The first line creates the tuple class, and the second line makes the constructor. (Slava described the details and rationale for the constructor system.) It is simple to use this. To make a new foo tuple, just get the values of bar and baz on the stack and call <foo. With that object, the bar slot can be read with the word foo-bar the baz slot can be read with the word foo-baz. To write values to those slots, use the words set-foo-bar and set-foo-baz. And, of course, to test if an object is a foo tuple, use the word foo?. All in all, TUPLE: is a wildly unhygenic macro, and we like it that way!

But wouldn't it be useful if we could let tuples inherit from each other? We've already seen that Factor allows for general subtypes and supertypes, but this does not extend to tuples. The way that tuples can be based off each other is through delegation: one tuple delegates to another. Each tuple has a slot, called the delegate (accessed with delegate and set-delegate). When a generic word dispatches on a tuple, but it doesn't have a method implementation for that tuple, it looks in the delegate slot for what to do next. If there is a delegate, then the generic word is again invoked on that delegate, with the hope that generic word itself has an implementation for that class. If not, it's on to the next delegate until there is no delegate anymore--the delegate is f.***

Instead of inheriting from a class, tuples delegate to an instance. That is, if I have my tuple TUPLE: foo bar baz and I want it to inherit from TUPLE: bing quux ;, all I have to do is make an instance of foo and an instance of bing, and set the delegate of the foo to the bing. Slot accessors like bing-quux and set-bing-quux are generic words implemented, by default, only on that one tuple class that they were made for. So, if you call bing-quux on a tuple of class foo that delegates to a bing, the foo instance will fail to respond and the generic word will be called again on the delegate, bing. This delegate will know what to do, and it'll return its quux value.

There's a subtle problem here with delegates. Look at the following code:

TUPLE: foo bar baz ;
TUPLE: bing quux ;
M: bing who how ;
M: bing how drop "bing" print ;
M: foo how drop "foo" print ;

You might expect that when who is called on a foo instance, "foo" is printed, but in fact, "bing" is printed. This is called the slicing problem: the object is "sliced" when using delegates, so only part of the object is represented in further method calls. On one hand, the behavior of M: bing who is more predictable, but on the other hand, information is lost. It's not a horrible thing, just something to watch out for.

Use cases

As I said before, a lot of Factor code has no need for object orientation. Going by my own experience, some complicated libraries, such as Unicode and inverse, don't yet use the object system at all. The XML-RPC vocabulary relies heavily on generic words to perform dispatch in turning Factor data into XML-RPC format, though no new classes are defined. The XML library, however, uses tuples to represent XML at various stages of processing, and dispatches on the class of these tuples to provide complex functionality. Several tuples are defined in the XML library to represent exceptions, though any object can be thrown as an exception.

In general, the object system should be used when it's useful and not used when it's not useful. That seems simple enough, but it is a radical departure from the philosophy of other languages where all procedures are methods. It's good to use whichever strategy you think will produce the most maintainable, clear code.

The future

For a while, I hesitated to write about the object system because it changes so frequently. Tuple constructors, for example, changed very recently to eliminate the default constructor and replace it with a more flexible system. Mixins are also a recent addition, yet they significantly change the way default methods are implemented. Soon, Slava plans to replace delegation with tuple inheritance.

Since the Factor object system is constructed completely in Factor, rather than being built in to the virtual machine, any changes to it can be done within Factor itself. This is a huge advantage. One change that I proposed could easily be written in Factor and run without any special care. Fundamental changes to the object system can even be interactively tested and debugged!

* Eduardo Cavazos has written a message passing object system in Factor, and he uses it for some of his code. Factor allows other object systems to be created, but with his system, not everything is an 'object' that can receive messages. Still, it's an interesting (and surprisingly terse) example of the flexibility of Factor, and of useful syntax extension. It is not widely used, however.

** I implemented double dispatch as a library called visitor (since it's implemented with the visitor pattern, yielding a simple lexicographic method order). Maybe in the future we'll have double dispatch in the Factor core, to better implement words like like, which end up approximating double dispatch using conditionals. Though it makes sense in theory, I can't imagine any use for triple dispatch in Factor.

*** As you can see, if you make an infinite (that is, recursive) delegate chain, method resolution may not halt. But why would you ever want to do that? Anyway, since we have predicate classes, it's already not guaranteed to halt!

Blogging is much more fun than writing essays about the history of South Africa.


Elie Chaftari said...

The Factor FAQ and now the Factor's object system. A very interesting and useful read. Two excellent articles back-to-back, Daniel.

wtanksley said...

Very nice, and a very impressive language.

But I disagree that the slicing problem in delegation is a minor problem... I would have mentioned it, except that I didn't know about it :-).

Slicing is minor, but what Factor is doing here isn't delegation at all, it's actually consultation, and is far less powerful than delegation.

I strongly recommend that you or Slava read up on delegation and consider whether upgrading to it might improve Factor (rather than switching to class-based inheritance).

A good place to start is:

Delegation is notable as the mechanism that gives Io its compact power.

Daniel Ehrenberg said...


About the terminology of delegation vs. consultation, I didn't know that, but I'll leave the article to reflect the current usage in the Factor community. You're very right that Factor's mechanism is very different from Io's or Self's object-based system. Looking at that link you provided, it looks like using true delegation rather than consultation might fix most of the problems that we have right now.

I strongly encourage you to try to implement delegation or something like it, to see how it would work out. An implementation would involve altering the definition of perform-dispatch and the words it calls. More specifically, the solution might be in changing empty-method, though altering this alone won't fix the problem.