Monday, December 10, 2007

Multiline string literals in Factor

It's always annoyed me somewhat that Factor strings can only be on one line and that there was no mechanism for anything like "here documents" like Perl has. So I decided to write it myself. At this point, I don't really need it and have forgotten what I wanted it for, but it was still a fun exercise.

I started out thinking I should do something similar to some other languages do it: write a word, maybe called <<<, which is followed by some string which is used to delineate a multiline string literal expression. But I realized this wouldn't be the most idiomatic way in Factor. First, if you're making a multiline string literal, why would you ever have it be within a word? For constants like this, it's considered best practice to put them in their own separate words. Second, why do I need to give the option of choosing your own ending? What's wrong with just using a semicolon, like other Factor definitions?

Putting this together, I came up with the following syntax:

STRING: something
Hey, I can type all the text I want here
And it can be over multiple lines
But without the backslash-n escape sequence!

The final line, the semicolon, has to have no whitespace before or after it. This allows for practically any useful string to be written this way. The syntax will compile into something like this:

: something
"Hey, I can type all the text I want here\nAnd it can be over multiple lines\nBut without the backslash-n escape sequence!" ;

With a previous parser design, multiline string literals like this were impossible, but now they can be done in 11 lines. I packaged this up and put it in my repository under extra/multiline so others can use it.

Using the internals of the parser, the first word advances the parser state one line and returns the text of the new line.

: next-line-text ( -- str )
lexer get dup next-line line-text ;

The next two words do the bulk of the parsing. They advance the parser's current line until reaching a line consisting only of ";", and then advance the line one more time. While moving forward, a string is compiled consisting of all of the lines, interspersed with newlines.

: (parse-here) ( -- )
next-line-text dup ";" =
[ drop lexer get next-line ] [ % "\n" % (parse-here) ] if ;

: parse-here ( -- str )
[ (parse-here) ] "" make 1 head*
lexer get next-line ;

Finally, the word STRING: puts it all together, defining a new word using a string gathered with parse-here.

CREATE dup reset-generic
parse-here 1quotation define-compound ; parsing

There are downsides to having an extremely flexible syntax like Factor. Things can be less predictable when libraries can alter the fundamentals of syntax. It'd be impossible to create a fixed BNF description of Factor syntax. Additionally, the particular structure of Factor sometimes encourages syntax extension that's heavily dependent on the details of the current implementation. But the upside is that we can do things like this. I think it's worth it.

No comments: