I've shown a few parsing words on this blog that create words that aren't mentioned directly in the text. An example of this is
TUPLE:
, which makes a class with a number of slots, a constructor, a predicate, accessors and setters. It's very useful, even if not all words it creates are given literally in the code. Now, what would you say if I told you I have an unhygenic macro that creates O(n^m) words? (I'll tell you what m and n are later.) In reality, it makes more than n^m words. With 7 as m and n, it creates 1419760 words, and with 8 as both of them, Factor crashes because it runs out of memory after expanding the heap a number of times.
You're probably thinking, "Why would I ever want such a stupid unhygenic macro? And what does it do? Your introduction was so boring and pointless, Dan." Well, I'll tell you: it's a new stack shuffling syntax.
dmpk2k
on
#concatenative
asked me about alternative shuffle word syntax. For example, why not make
dup
into
x-xx
, swap
xy-yx
, etc? It'd be much easier to remember, and there would be no need for "stack idioms" like
swap rot
for
xyz-zyx
. Even complicated stack shuffling would be simple.
Believe it or not, it's not all to hard to do in Factor. It'd be prohibitively difficult to extend the parser to make this work for
all possible shuffles, so I took a halfway approach: provide a way to define as many as you want. The syntax I came up with was
SHUFFLE: abcd 6
which defines shuffle words for up to 4 inputs labeled a, b, c and d (in order), and up to 6 outputs, defining a paltry 6686 shuffle operators, everything from
abcd-
to
ab-baabaa
. (If you want, you can alias them, as in
: tuckd xyz-yxyz ;
, but there's generally little point.) Some of the shufflers do the same thing as others, such as
ab-a
and
a-
, but this constitutes a very small fraction of the total. Going with the variables at the beginning of the article, n is the length of the alphabet (here, abcd) and m is the maximum output length (here, 6). The total number of transformations is higher than 4^6 (4096) because that count only includes the longest transforms, from four elements on the stack to six. However, it is still an accurate asymptotic approximation.
So, there are two difficult programming problems here, in addition to some easy ones. The hard ones are (a) generating all possible stack transformations and (b) turning a stack transformation into a quotation.
Generating all stack transformations(a) may look simple, but it's a bit more complex than it seems. It is easy to get the possible inputs: it's just somewhere between the first element of the alphabet and the entire alphabet. But the outputs can be any number of the inputs, in any order, duplicated or not.
This can be reduced to a simpler problem. Let the length of the alphabet be A, and the length of the output be O. Given an alphabet length and array length, generate an array of arrays consisting of all the possible combinations of numbers 0 to A-1. The length of the array generated will be O^A because there are O spaces, each of which can be filled with A different items. So if we can find a bijection between the numbers from 0 to (O^A)-1 (let's call one of these numbers N) and the arrays we need, the problem is easy.
As it happens, the structure of this problem is remarkably similar to converting numbers to different bases. We can treat our number N as a base A number. Then we create an array of length O consisting of the digits of the number. Each digit should be between 0 and A-1. If we convert each number N into this base, we'll get every possible combination for the output array. The code for this is below, where
translate
converts a number into an array with a given length and alphabet size, and
(combinations)
creates all possible combinations for the given length and alphabet size:
: translate ( alphabet out-len n -- seq )
-rot [ drop /mod ] map-with nip ;
: (combinations) ( alphabet out-len -- seq[seq] )
2dup ^ [ translate ] map-with2 ;
To make an array of all the possible stack shuffles with input length from 1 to max-in and output length from 1 to max-out, we need a couple more words. To represent a stack shuffle, I used a 2array whose first element is the number of inputs and whose second element is an array of the output indices.
: combinations ( n max-out -- seq[seq] )
1+ [ (combinations) ] map-with concat ;
: make-shuffles ( max-out max-in -- shuffles )
[ 1+ dup rot combinations [ 2array ] map-with ] map-with concat ;
(Going counter to the Forth/Factor mantra, I actually wrote this code in a top-down fashion. First, I wrote
make-shuffles
, and then realized I'd need
combinations
, and then realized I need
(combinations)
and
translate
. This is generally not a good tactic, but it works for small problems that take a lot of reasoning, like this.)
Generating quotationsIt surprised me, but (b) turned out to be the easy one. At first, to transform the stack, I stored the top few elements in an array, then accessed the elements used for output and dropped the array. But that isn't optimal, because an array is unnecessarily allocated. It's slightly more efficient if we just pass things around on the stack.
To make this easier, I used Chris Double's
extra/shuffle
library. It defines words like
[npick]
and
[ndrop]
which generate quotations for arbitrary depth pick and drop. Using this, I can take the strategy of reaching down and pulling up the last element of the output array, then dipping beneath it. After iterating backwards through the entire output array, I drop the inputs and come back up. That wasn't too good an explanation, so just have a look at the code:
: shuffle>quot ( shuffle -- quot )
[
first2 2dup [ - ] map-with
reverse [ [npick] % \ >r , ] each
swap [ndrop] % length [ \ r> , ] times
] [ ] make ;
As an example, the code for
ab-ba
is
over >r dup >r 2drop r> r>
.
Wrapping it all upTo make this work, I also need a few trivial operators. There are words for turning a stack shuffle datum into a string:
: shuffle>string ( names shuffle -- string )
[ [ swap nth ] map-with ] map-with
first2 "-" swap 3append >string ;
For attaching a stack effect to a shuffle word:
: put-effect ( word -- )
dup word-name "-" split1
[ >array [ 1string ] map ] 2apply
<effect> "declared-effect" set-word-prop ;
And for defining a set of shuffle operators (
in-shuffle
and
out-shuffle
make sure that the operators get defined in their own minivocabulary, so they're not exported by default. This follows the example of
<PRIVATE
and
PRIVATE>
):
: in-shuffle ( -- ) in get ".shuffle" append set-in ;
: out-shuffle ( -- ) in get ".shuffle" ?tail drop set-in ;
: define-shuffles ( names max-out -- )
in-shuffle over length make-shuffles [
[ shuffle>string create-in ] keep
shuffle>quot dupd define-compound put-effect
] each-with out-shuffle ;
And finally, the dirty macro itself, the greatest sin to hygiene Factor has ever seen:
: SHUFFLE:
scan scan string>number define-shuffles ; parsing
Enjoy! I'm not sure whether it would be ethical for me to release this evil into the Factor distribution, so I haven't pushed it on to my repository.