Thursday, February 5, 2009

XML pattern matching

I've implemented syntax for pattern matching on interpolated XML literals. This is a Scala-inspired feature which may or may not be useful, but is definitely cool-looking. Here's a sample of code:
: dispatch ( xml -- string )
{
{ [ [XML <a><-></a> XML] ] [ "a" prepend ] }
{ [ [XML <b><-></b> XML] ] [ "b" prepend ] }
{ [ [XML <b val='yes'/> XML] ] [ "yes" ] }
{ [ [XML <b val=<->/> XML] ] [ "no" prepend ] }
} switch ;

And here's some examples of what it does:
[XML <a>pple</a> XML] dispatch
=> "apple"
[XML <b>anana</b> XML] dispatch
=> "banana"
[XML <b val="yes"/> XML] dispatch
=> "yes"
[XML <b val="where"/> XML] dispatch
=> "nowhere"

The pattern matching here is based on my inverse library. Hopefully you get the high-level idea of how XML pattern matching works. A caveat is that namespaces are ignored, as in Scala's XML pattern matching, and I haven't thought of a good way to incorporate namespaces.

Jeff Attwood recently wrote a blog post about XML literal syntax*, and how it's a big help. In part, I agree with his statement that it lets you write cleaner code to have XML literals mixed in with everything. That's why I implemented them in Factor.

But I think it's insane to have them built in the way that Scala and, apparently, VB.NET have. Just think if the designers of Java had this idea 15 years ago. They would have made SGML literal syntax rather than XML, and there would be no way to remove this for backwards compatibility concerns. Every implementation of Java would have to contain a full-featured SGML parser. Generating HTML would be easier and allow for very pretty code, though.

XML won't be so prominent forever. It'll always be there, of course, just like SGML is still there in non-XML HTML and DocBook files which refuse to die. But I wouldn't be surprised if, in 20 years, what we want is a literal syntax for something other than XML. We'll still be using a lot of the same programming languages as were invented today, though.

Factor's XML literal syntax and XML pattern matching are just libraries, and I wouldn't want it any other way. Factor's parser needed no adjustments at all to allow for XML literals. If an XML literal is written when the vocabulary xml.literals isn't in scope, it's a syntax error. Sure, you need delimiters [XML XML] around the XML literals, but this is a small price to pay.

In 20 years, if XML isn't useful any longer, then Factor programmers will be just as well off as if they are. If there's a new data format, it'll just be a new library to download, and you'll have literal syntax for that data format. If you were using VB.NET, though, you'd have to upgrade to the most recent version of the language, with a parser that's been vastly complicated.


*He actually talked about HTML literal syntax, but that's such a ridiculous idea that I know he just mistyped. There's no way to tell where an HTML fragment ends, for one, and the HTML DTD would have to be part of the core grammar of the language. You would need delimiters around where the HTML starts and ends, and none of his code fragments have those delimiters. The X key is right next to the letters H and T, so it's an honest typo. His pseudocode sample just before the VB.NET fragment must have also been a simple mistake, as it would seem to imply that either XML literals get printed immediately or that there is some implicit way of collecting them.

4 comments:

Slava Pestov said...

In 20 years, Jeff Atwood won't be useful any longer, but XML probably will be.

br1 said...

Xml literals in VB integrate with the type system. They pay a hefty implementation cost, but end up with a more powerful feature. It's not really comparable; Xml literals in Factor are just better sintax.

Slava Pestov said...

Factor's XML literals are checked at parse time for well-formedness, and there is no way to generate malformed XML. Also the code generated is efficient since there's a compile-time transform going on. So what exactly does the VB approach buy us, other than a facepalm-inducing complication of the grammar?

br1 said...

I was wrong. I thought VB integrated schema checking, but it only has support for XML IntelliSense.