Tuesday, January 2, 2007

XML combinators

I really need to work on making my writing make sense, because I think my last post failed at that. The XML library that I've written has a lot of combinators defined for processing, and in this blog post, I'm going to try to describe them as well as I can.

The basic idea behind the XML combinators is to let XML be processed the same way Lispers (and Factorers) have always processed lists and other sequences: with map each find subset and now,inject. These operations are slightly more complicated when used with a tree structure like XML, rather than a flat sequence as the operations were originally designed for. Fortunately, Factor's generic word system makes this relatively easy.

There's one consideration necessary in designing these that I wasn't happy with: since I had to iterate over all nodes, not just the leaf nodes. In all combinators, the result can be affected by which order is used. I made a relatively arbitrary decision and said that first the parent nodes were processed and then the leaves.

A note before the code of xml-each, the simplest of the combinators: this uses Factor's generic words, which dispatch only on the top of the stack. xml-each must dispatch on the class of the XML tag, not the quotation. However, the Factor convention for combinators is that the quotation is on the top of the stack, because this looks better syntactically. So I need to do a swap before entering the generic word. That said, here's the combinator:

GENERIC: (xml-each) ( quot tag -- ) inline
M: tag (xml-each)
[ swap call ] 2keep
tag-children [ (xml-each) ] each-with ;
M: object (xml-each)
swap call ;
M: xml-doc (xml-each)
delegate (xml-each) ;
: xml-each ( tag quot -- ) ! quot: tag --
swap (xml-each) ; inline

The rest of the combinators are implemented in much the same way, with only trivial changes to make different ones. Unlike some combinators (see my previous post) this is very simple to implement, not in small part because I have existing combinators to help. (Unfortunately, it doesn't seem to be possible to abstract a general "lift" combinator (analogous to liftM in Haskell for monads) to turn each into xml-each and map into xml-map, despite the similarities between them).

A really simple usage of xml-each is [ write-item ] xml-each, which prints out each element of the given XML tag, document, or other type of element.

As a more in-depth example, xml-find (which works just like find, but without returning an index) is used to implement the equivalent of .getElementByID(), called get-id:

: get-id ( tag id -- elem )
swap [
dup tag? [
"id" prop-name-tag
[ string? ] subset concat
over =
] [ drop f ] if
] xml-find nip ;

(Note: one weakness of the XML utilities that currently exist is that it's annoying to deal with XML tag attributes, though this should be fixed when I do more real work using them)

No comments: