Tuesday, April 6, 2010

A couple language design ideas

I know I shouldn't really write about something I haven't implemented yet, but here are a couple ideas that I've been talking about with some of the other Factor developers that I'd like to share with you. Credit for these ideas goes mostly to Slava Pestov and Joe Groff.

Multimethods in Factor

Factor should have multimethods. They'd clean up a lot of code. There already are multimethods, in extra/multimethods, but these are extremely slow (you have to search through half of method list to find the right one in the average case, which is unacceptable). They could also have their syntax cleaned up a bit. These multimethods, as they're already implemented, combine dispatching on things on the stack with dispatching on the values of dynamically scoped variables. The latter is called 'hooks' in Factor, and is useful for what other languages do using conditional compilation.

Here's a sample of what the syntax might look like:
! Before the -- is upper bounds on the allowable methods
! after the -- is a type declaration that is checked.
GENERIC: like ( seq: sequence exemplar: sequence -- newseq: sequence )

! Parameters with specific types in the parentheses--this is not a stack effect
M: like ( array array ) drop ;

! The single parameter refers to the top of the stack
! and the second element isn't used for dispatch
! so it defaults to sequence
M: like ( array ) >array ;

GENERIC: generate-insn ( instruction: insn -- | cpu ) ! after the | is the hook variables

! Before the | comes from the stack, after is variables. x86 is a singleton class.
M: ( ##add-float | cpu: x86 )
[ dst>> ] [ src1>> ] [ src2>> ] tri
double-rep two-operand ADDSD ;

I've implemented some of this syntax in the multimethods branch of my git repository. Another cool part, language-design-wise, is that multimethods can replace a few different other language features.

For one, they can replace Joe Groff's TYPED:. That lets you declare the types of arguments of a word, and have those be automatically checked, with checks removed if the compiler can prove they're unnecessary. In the ideal design, : will replace TYPED:, and if any parameters have type declarations, then the colon definition is macro-expanded into a generic word with only one method. This implements the type checking.

Multimethods could also be used to implement 'hints' better than they are right now. Hints are a feature of the compiler where a programmer can instruct the compiler to create specialized versions of a function for certain parameter types. Currently, hints use a very inefficient dispatch mechanism: when you call a word with hints, it goes down the list of specialized implementations of the word, testing the stack to see if the contents of the stack are members of those classes. But if efficient multimethod dispatch were worked out, this could be much faster. Also, method inlining would make it so that the hints dispatch would be eliminated if types are known by the compiler. This isn't done right now.

Callsite specialization could also be implemented with the help of multimethods. The difference between this and hints is just that a new hint would be generated if type inference found certain types for the arguments of a function, but there wasn't already a hint for that type. The basic idea for callsite specialization is that the specialized versions would only be used when the compiler can prove that they're appropriate. But if efficient multimethod dispatch were used, then callsite specialization could be used to generate hints that are also available on values whose types are unknown until runtime. Runtime type feedback could work this way too.

There are some implementation challenges in making multimethods fast. From what I've read, it seems like the best way is something like this algorithm by Chambers and Chen used by the Cecil compiler. It generates a dispatch DAG for each multiple-dispatch generic word, where the nodes are single dispatches and the leaves are methods. The algorithm described directly in the paper looks a little impractical, since it includes iterating over all classes, but I think if you just iterated over classes that are method arguments, together with the intersections of these, then that'd be enough.

There's a problem, though, which is that Factor's object system is complicated, and it's not obvious how to get the intersection of classes. The current implementation is probably buggy; if it's not, then I can't understand why it works. So class algebra will have to be fixed first, before multimethods can work. I hope I can figure out how to do this. I bet it has something to do with Binary Decision Diagrams, since we're talking about intersections, unions and negation. We might want to make things simpler by factoring predicate classes out in the way that the Cecil paper does. Either way, things are complicated.

A protocol for variables

I never would have expected it, but the stack-based programming language Factor uses tons of variables. And lots of different kinds, too.
  • Lexically scoped variables in the locals vocab
  • Dynamically scoped variables in the namespaces vocab
  • Two types of globals, using VALUE: and using words for dynamically scoped variables in the global scope
  • Thread-local variables in the thread vocab
  • For web programming with Furnace, there are session variables and conversation variables and scope variables
These are all useful and necessary (though maybe we don't need two kinds of globals), but it's not necessary that they all use different syntactic conventions. They could all use the same syntax.

Some of these variables always correspond to words (locals and values) and some of them don't. Probably it'd be better if all variables corresponded to words, to reduce the risk of typos and to make it possible to have terser syntax to use them, the way we have terse syntax to use locals and values.

Here's my proposal for syntax. Whenever you want to use a variable (unless it's a local), you have to declare it beforehand in some vocab. For example, the following would declare a dynamically scoped variable foo and a thread-local variable bar. Currently, you wouldn't declare these ahead of time, and instead you would just use symbols for these purposes.
DYNAMIC: foo
THREAD-LOCAL: bar
Then, to read the values of these variables, the following syntax would be used
foo
bar
Reading a variable should just be done by executing the word that is the variable. This is how it works in almost all programming languages. Since the variable has been declared ahead of time, we know what kind of variable it is and how to read it. There could be a single parsing word for writing to variables, such as to: (this is what values uses right now), as in the following.
4 to: foo
Again, since the variable is declared ahead of time, we know how to write to it, and the right code is generated at parsetime. We could also have combinators like change, written in a way that works with all variable types:
[ 1 + ] change: foo

Another change I'm interested in is the scoping rules for dynamically scoped variables. Right now, when with-variable is invoked, a totally new scope is made such that if any dynamic variable is set, that setting is only kept as long as the innermost with-variable call. These semantics were made back when the fundamental construct for variables was bind, but in modern Factor code, this isn't so important anymore.

The current semantics are a frequent source of bugs. Basically, it's the funarg problem all over again: if you have a higher order function which includes a with-variable call, and if the quotation sets any variables, then these variable settings don't escape the higher order function call. This is usually not what the programmer intended! There are various hacks, euphemistically called idioms, used in Factor to get around this.

The semantics should really be that set modifies the binding in the scope that defined the variable, with the original binding always coming from with-scope. But this change would break tons of Factor code that already exists, so I don't expect it to be used any time soon. Also, it would make things less efficient if each variable binding had its own scope, since right now things are often grouped together.

One of my far-out hopes with variables is that maybe dynamically scoped variables could be optimized so that they would be as efficient as lexically scoped variables. But this would require lots of changes to Factor's implementation. For example, right now no level of the Factor compiler knows anything about any kind of variables; they're either subroutine calls or stack shuffling by the time the compiler sees it. Still, a sufficiently smart compiler could translate a usage of make into a version that stores the partially built sequence on the stack, and there have been compilers that do the equivalent of this.

5 comments:

zimbatm said...

Regarding symbols, I believe it would be helpful to distinguish them symbolically from other method calls.

I'm a distant factor follower and read some of the code you people produce. Given you imported a dozen of libraries, method definitions rapidly start looking like word soup. If symbols where prefixed by a colon for example, it would be all that, that I don't have to look-up. BTW, this idea comes from the ruby language.

Alex J. said...

Would

4 to: foo

and

4 \ foo to

be equivalent?

Daniel Ehrenberg said...

zimbatm,

I think that would best be accomplished through an unenforced naming convention. A colon is probably a bad idea, since that character already has at least three meanings in Factor, but maybe a dollar sign. You'd do

DYNAMIC: $foo
foo
4 to: foo

Alex J,

That's definitely a possibility, that the to: parsing word would have a runtime equivalent called to. Like maybe to: x would expand into \ x to . It's unclear, though, if we need that feature exposed to users (to be able to write to variables where the variable name isn't known ahead of time). In the case that you know the variable name is known ahead of time, I think the syntax I proposed in the article is preferable, though that's really a matter of opinion.

Anonymous said...

Two fuzzy recollections, perhaps relevant:
(1) You can't count on being able to define words inside a locals clause.
(2) Word definitions don't commit until end of file.

The combination made using Factor as a backend for a scheme-like language prohibitively painful.

Daniel Ehrenberg said...

Anonymous,

The semantics of Factor and Scheme bindings are significantly different that you won't be able to just rely on Factor locals and word definitions. But it's definitely possible to get Factor working as a Scheme backend. Feel free to email me for specific advice on fixing your problems (or even better, you could ask the mailing list or the IRC channel).