Sunday, January 25, 2009

Factor supports XML literal syntax

Factor can now express XML as literals in code. There's a new library, xml.interpolate, which lets you create an XML chunk or document using interpolating either by locals or using a fry-like syntax. Here's a taste, from the syndication vocab, to create an Atom file:
: feed>xml ( feed -- xml )
[ title>> ]
[ url>> present ]
[ entries>> [ entry>xml ] map ] tri
<XML
<feed xmlns="http://www.w3.org/2005/Atom">
<title><-></title>
<link href=<-> />
<->
</feed>
XML> ;
This could also be written with locals:
:: feed>xml ( feed -- xml )
feed title>> :> title
feed url>> present :> url
feed entries>> [ entry>xml ] map :> entries
<XML
<feed xmlns="http://www.w3.org/2005/Atom">
<title><-title-></title>
<link href=<-url-> />
<-entries->
</feed>
XML> ;
Here's an example with more complicated logic:
"one two three" " " split
[ [XML <item><-></item> XML] ] map
<XML <doc><-></doc> XML>
whose prettyprinted output (using xml.writer:pprint-xml) is
<?xml version="1.0" encoding="UTF-8"?>
<doc>
<item>
one
</item>
<item>
two
</item>
<item>
three
</item>
</doc>
The word <XML starts a literal XML document, and [XML starts a literal XML chunk. (A document has a prolog and exactly one tag, whereas a chunk can be any kind of snippet, as long as tags are balanced.) The syntax for splicing things in is using a tag like <-foo->. This syntax is a strict superset of XML, as a tag name in XML 1.0 is not allowed to start with -.

It took me just an evening to hack this up. It's less than 15 lines of code modifying the XML parser, and all the interpolation is less than 75 lines of code. Best of all, unlike Scala's XML literals, this doesn't affect the core of the language or make anything else more complicated. It's just nicely tucked away in its own little vocabulary. I plan on replacing usages of html.components with this, once I make some tweaks.

7 comments:

Ludovic Kuty said...

Your posts are really interesting. Thanks for all that

Daniel Spiewak said...

Interesting! One of those situations where concatenative languages show their true power as a DSL powerhouse. It is worth noting however that technically Scala's XML literals are partitioned off in their own discrete module as well, just not as nicely as this (not a straight API, unfortunately). Scala's XML support is handled by the parser, but it's hardly muddying the waters.

With that said, I despise Scala's XML literals, it's just too bad that it isn't syntactically powerful enough to do something like this.

Daniel Ehrenberg said...

By the way, the fact that this is possible has nothing to do with Factor being concatenative. It's just a property of the way parsing words work. It'd also be possible in MetaOCaml, Common Lisp and VPRI's language, I think, though I don't know how those systems handle module scope of syntax extensions.

Daniel Spiewak said...

@Daniel

Hmm, I'm not sure that's true. I'm not familiar with MetaOCaml or VPRI, but I'm absolutely certain that you couldn't achieve a *literal* XML syntax in Lisp. Macros are powerful, but you still need to reduce to an S-Expression at some level.

Concatenative languages have the serious advantage of being able to parse an undifferentiated token stream. The intrinsic syntax is so simple that there are almost no limitations on what is possible above and beyond the core.

Daniel Ehrenberg said...

Daniel,

I said Common Lisp rather than Lisp because I think this is possible with CL's reader macros. Here's a blog post about them: http://dorophone.blogspot.com/2008/03/common-lisp-reader-macros-simple.html. A limitation of these is that they're not module-scoped, and they have to have a special first character.

Daniel Spiewak said...

I stand corrected! I was unaware that Common Lisp allowed the definition of macros at the reader level. Of course, this would open up some significant possibilities. The unique character requirement is a little annoying, but shouldn't be a serious limitation for an XML DSL.

Anonymous said...

VPRI's OMeta can definitely do it, it's designed for modifying language syntax on the fly. One of their projects is a TCP implementation in a couple hundred lines of code, built by creating a syntax that exactly matches the ascii diagrams in the RFC.