Saturday, June 30, 2007

Concatenative pattern matching

Finally, after more than two years of flopping around in my brain, I've implemented a kind of concatenative pattern matching in Factor. Not sure how useful it is, but it was fun. And this blog is called useless Factor for a reason: because I explain useless stuff. I'm not giving software engineering lectures here.

Anyway, it's not really specific to Factor; it'd be fine to implement in Joy too, or basically any other dynamically typed concatentive language where quotations can be manipulated as lists/arrays (though the outcome probably wouldn't be as flexible in Joh, because of Factor's word properties). The basic concept is simple, just a requiring a rethinking of pattern matching. Maybe I misunderstood it, but I always thought of ML-style pattern matching as doing two things: matching numeric literals for equality and examining instances of algebraic datatypes. Based on these principles, Chris Double made a great pattern matching library, currently residing in extra/match. But I've always had a problem with it, due to its heavy reliance on named variables. That doesn't detract from its utility; it just bothers me.

A more concatenative solution requires a reanalysis of the problem. Viewed more generally, pattern matching is the inverse of computation. What do I mean by that? Well, take a simple example in Haskell:

data List a = Nil | Cons a (List a)

sum :: Num a => List a -> a
sum Nil = 0
sum (Cons hd tl) = hd + sum tl

example :: Int
example = sum (Cons 4 (Cons 2 (Cons 5 Nil)))

(If this doesn't make sense, you should read this terse but good introduction to Haskell.)

So, you can think of sum as "undoing" the Cons function, when was called, and yielding the arguments. But when Nil was used, it "undoes" that (returning no arguments). When you think of pattern matching in this way, it makes it easier to translate things into a useful form for concatenative languages.

Here's a simple, useless Haskell example of the simplest form of pattern matching, just plain old undoing:

Cons 1 x = Cons 1 2

This sets the variable x to 2. It does this in a number of steps: first, it constructs the Cons 1 2, evaluating the right hand side. Then, it moves on to the left hand side. It checks that the value is a Cons, otherwise it fails. Next, it destructures the Cons, giving the values 1 and 2. With these values, it tries to match the 1 to 1 (succeeding) and the 2 to x (setting 2 to x). With slightly altered code, the match would fail, yielding a runtime error:

Cons 5 x = Cons 1 2

Obviously, 1 does not match 5, so this fails.

So it looks like the operation here is an unconditional undo. How could we implement this in a concatenative language? It's simple: just reverse the quotation and undo each word in order. Let's take the simple example above and translate it into Factor:

TUPLE: cons hd tl ;
1 2 <cons> [ 1 swap <cons> ] undo

This constructs a cons cell, and then undoes the construction operation. It matches the cons-hd to 1 and the cons-tl is left on the stack at the end of the undone operation. If, for some reason, we didn't have a cons cell, or if the cons-hd wasn't 1, there would be a runtime error, something along the lines of "Unification failed".

So, the inverse of [ 1 swap <cons> ] is the inverse of <cons> followed by the inverse of swap followed by the inverse of 1. The inverse of <cons> just checks to see if the value is a cons. If yes, then return the values of the slots (except the delegate). If no, fail. The inverse if swap is, simply, swap. And the inverse of 1 is to pop the top of the stack, and fail if it is not 1. So, all put together, the inverse code could be written as

: undo-cons-swap-1 ( cons -- tl )
dup cons? [ fail ] unless [ cons-hd ] keep cons-tl ! inverse of <cons>
swap ! inverse of swap
1 = [ fail ] unless ; ! inverse of 1

(In reality, it's a bit more complicated when you include unbound variables, so you could undo, for example, dup drop, but let's not get into that just now)

A second operation is testing if match succeeds or fails, returning a boolean. Imagine the Haskell code

cons1 :: List -> Bool
cons1 Cons 1 _ = True
cons1 _ = False

Below, I'll discuss another way to implement this, but one thing that's possible is to just have [ 1 swap <cons> ] matches?. This attempts to match the item at the top of the stack to the undone operation. If it succeeds, it returns true, otherwise it fails. Regardless of the success or failure, everything else left on the stack by the undo operation is dropped.

The only real challenge here is what to do with errors. A unification failure should be caught and returned as f, while, say, a stack underflow error should be propagated. In Factor, recover catches all exceptions indiscriminately. To have this behavior, I had to explicitly rethrow errors which weren't unification failures.

Imagine this syntax in Factor for the original Haskell sum function:

TUPLE: nil ;
TUPLE: cons hd tl ;

: sum ( list -- sum )
{ [ <nil> ] [ 0 ] }
{ [ <cons> ] [ sum + ] }
} which ;

: example ( -- num )
4 2 5 <nil> <cons> <cons> <cons> sum ;

which is the name I chose for a pattern-matching case: it chooses which of the options matches the data. It's like a Haskell case statement. For each 2array, it tries to undo the first part. If it succeeds, the associated quotation is executed. Otherwise, it moves on to the next 2array. The code it compiles to looks something like this:

: sum ( list -- sum )
[ [ <nil> ] undo 0 ] [
[ [ <cons> ] undo sum + ]
[ no-match ] recover-fail
] recover-fail ;

where recover-fail is a word which is like recover but propagates all errors which are not unification failures.

This should all generate relatively efficient code. But even if it weren't for that, I'm more glad that I got this idea written out, because it's just conceptually interesting.

The code is at extra/inverse if you want to look at it. I haven't written documentation yet.

Note: If you are a Haskeller, you were probably horribly offended by my explanations of Haskell pattern matching. For a correct description of what's going on, see this. I oversimplified and made it sound a bit more imperative than it is.

PS. The reason I say this is concatenative pattern matching is that patterns can be concatenated (backwards). For example, [ 1 swap <cons> ] undo is the same as [ <cons> ] undo [ 1 swap ] undo, which is the same as [ <cons> ] undo [ swap ] undo [ 1 ] undo. This grouping is associative. So it follows the laws of concatenative languages, as outlined by Manfred von Thun. By comparison, pattern matching along the lines of extra/match is applicative, applying constructor functions to arguments, which are constants or variables. My goal here was really more to show that concatenative pattern matching is possible rather than to build something useful.


Chris Double said...

Interesting post. How does it scale for cases that pattern matching is often used for, like analysing the AST from a parse?

For example, the matching code in this post:

Daniel Ehrenberg said...

Here's an untested translation of the first interpreter:

: eval1 ( a -- a )
{ [ <lit> ] [ ] }
{ [ <inc> ] [ eval1 1+ ] }
{ [ <isz> ] [ eval1 zero? ] }
{ [ <iff> ] [ rot eval1 ? eval1 ] }
{ [ <pair> ] [ [ eval1 ] 2apply 2array ] }
{ [ <fst> ] [ eval1 first ] }
{ [ <snd> ] [ eval1 second ] }
} which ;

To me, this is a little cleaner since you can use the regular stack idioms instead of having to refer to the local variables. Still, I like the generic word version better.

Chris Double said...

I agree, it looks very nice. Nice job!