: Any 0 ;
: Extend 1 ;
: L 2 ;
! ...
But later, in constructing the table (discussed in more detail below), I found the need to write arrays like
{ Control CR LF }
. But in Factor, this doesn't work the way you might think it does; it's basically equivalent to, in Lisp, '(Control CR LF)
(a list of the symbols) when we actually want `(,Control ,CR ,LF)
(a list of the values that the symbols refer to). How can we get that quasiquotation in Factor? The most obvious way is using
make
: [ Control , CR , LF , ] { } make
. But that solution is pretty wordy, and not nearly fun enough. Here's another idea: what if you make the grapheme class words expand directly, at parsetime, to their value? This can be done if you use
: Any 0 parsed ; parsing
: Extend 1 parsed ; parsing
! ...
But who wants to write this eight times? I don't! We can abstract it into a parsing word
CONST:
to do this for us:
: define-const ( word value -- )
[ parsed ] curry dupd define-compound
t "parsing" set-word-prop ;
: CONST:
CREATE scan-word dup parsing?
[ execute dup pop ] when define-const ; parsing
But in this particular code, we use a particular pattern, similar to C's
enum
. Why not abstract this into our own ENUM:
?
: define-enum ( words -- )
dup length [ define-const ] 2each ;
: ENUM:
";" parse-tokens [ create-in ] map define-enum ; parsing
Going back to the original problem, the
ENUM:
parsing word allows me to write
ENUM: Any L V T Extend Control CR LF graphemes ;
to specify the grapheme classes, without having to care about which number they correspond to.
This solution isn't perfect. The problem here is that this completely ruins homoiconic syntax for all uses of the constants. By "homoiconic syntax," I mean that
see
can print out the original source code, with the only real difference being whitespace. A macro using compiler transformations, which would preserve homoiconic syntax by using a different strategy, might be preferable. But I still wanted to share this little thing with you all.Note: At the time I wrote this, this strategy made sense. But now, I'm thinking it'd be better to just go with a regular
C-ENUM:
, which should be renamed ENUM:
. This is partly because of changes in the parser which make it more difficult to use parsing words defined in the same source file.
No comments:
Post a Comment