Sunday, December 23, 2007

Books that should exist

As you can probably guess by the existence of this blog, I love to write. Part of what I love about it is the actually act of writing, but it's really more that I want things to be written that hadn't before been written. I want to write stuff that I wish I could have read. Right now, there are three books and one journal article that I think should exist that. Hopefully, I'll be able to write some of these some time in my life.

Implementing Unicode: A Practical Guide

One thing that struck me when beginning to write the Unicode library is that there aren't many books about Unicode. The two I found in my local Barnes and Noble were the Unicode 5.0 Standard and Unicode Explained. Looking on, I can't find any other books that address Unicode 3.1 (the version that moved Unicode from 1 to 17 planes) or newer in detail, ignoring more specialized books.

Both of these were both great books, but they aren't optimal for figuring out how to implement a Unicode library. Unicode Explained focuses on understanding Unicode for software development, but shies away from details of implementation. The Unicode Standard explains everything, but it often gets overly technical and can be difficult to read for people not already familiar with the concepts described. A Unicode library implementor needs something in the middle. Unicode Demystified might be the right one, but it describes Unicode 3.0, so it is in many ways outdated.

I wish a book existed which explained Unicode in suitable detail for most library implementors. If this book continues to not exist for many years, I might just have to write it myself. This, however, would be more difficult and less fun than my other book ideas.

[Update: After getting my hands on Unicode Demystified, I realize that I should not have thrown it aside so quickly. It's a great book, and nearly all of it is still relevant and current. It looks like the ebook version I have is updated for Unicode 3.1.]

Programming with Factor

Factor's included documentation is great, but it's not enough, by itself, to learn Factor. I, and probably most people who know Factor, learned through a combination of experimenting, reading blog posts and the mailing list, and talking on #concatenative. Many people will continue to learn Factor this way, but it still seems somehow insufficient. It should be possible to learn a programming language without participating in its community.

Of course, we can't write a book on Factor until we get to see what Factor will look like in version 1.0. But I'm confident that this book will be written, even if it goes unpublished in print, and I'm fairly sure that I'll have at least some part in it. Maybe it'll be a big group effort by the whole Factor community.

What I'm thinking is that we could have a book which teaches programming through Factor, to people who aren't yet programmers. I've talked a lot about this with Doug Coleman, and we agree that most programming books are bad; we should make a new book that does things very differently. But we can't agree or imagine just how...

The Story of Programming

I, like many of you reading this blog, have an unhealthy interest in programming languages. Mine may be a little more unhealthy than yours. Whenever I hear the name of a programming language that I don't know of, I immediately need to read about it, to get some basic knowledge of its history, syntax, semantics and innovations.

Most study of programming languages works by examining the properties of the languages themselves: how functional programming languages are different from imperative object-oriented languages and logic languages and the like. But what about a historical perspective? The history of programming languages is useful for the same reason other kinds of historical inquiry are useful. When we know about the past, we know more about the way things are in the present, and we can better tell what will happen in the future. The history of programming languages could tell us what makes things popular and what makes things ultimately useful.

Unfortunately, not much has been done on this. Knowledge of programming language history is passed on, unsourced, with as much verifiability as folk legend. The ACM has held three conferences called HOPL on this subject over the past 30 years, so all the source material is there. But apart from a book published in 1969, this is all I can find as far as a survey history of programming languages goes.

There is a limit to how much academic research papers can provide. The proceedings of the HOPL conferences aren't bedtime reading, and they don't provide much by way of a strong narrative. A new book could present the whole history of programming from the first writings about algorithms to modern scripting languages and functional programming languages so it's both accessible to non-programmers and interesting to programmers. As far as I know, no one's really tried. But it would be really fun to try.

Inverses in Concatenative Languages

Most of my ideas are either bad or unoriginal. But there's one idea that I came up with that seems to be both original and not horrible, and that's the idea of a particular kind of concatenative pattern matching (which I blogged about, and Slava also wrote about in relation to units).

The basic idea is that, in a concatenative language, the inverse of foo bar is the inverse of bar followed by the inverse of foo. Since there are some stacks that we know foo and bar will never return (imagine bar is 2array and the top of the stack is 1), this can fail. From this, we get a sort of pattern matching. Put more mathematically, if f and g are functions, then (f o g)-1 = g-1 o f-1.

In my implementation of this, I made it so you can invert and call a quotation with the word undo. We don't actually need a full inverse; we only need a right inverse. That is, it's necessary that [ foo ] undo foo be a no-op, but maybe foo [ foo ] undo returns something different. Also, we're taking all functions to be partial.

Anyway, I think this is a cool idea, but I doubt it could fill a book. I want to write an article about it for an academic journal so that I can explain it to more people and expand the idea. It could be made more rigorous, and it could use a lot more thought. I hope this works.

When will I write these?

So, I hope I'll be able to write these things. Unfortunately, I'm not sure when I could. I need to finish my next 10 years of education, which won't leave tons of free time unless I give up blogging and programming and my job. Also, I'm not sure if I'm capable of writing a book or an article for an academic journal, though maybe I will be in 10 years when I'm done with my education. It wouldn't be so bad if someone stole my thunder and wrote one of these things because what I really want to do is read these books.

Update: Here are a couple more books (or maybe just long surveys) that should exist but don't: something about operational transformation, and something about edit distance calculations and merge operations for things more complicated than strings. Right now, you can only learn about these things from scattered research papers. (I never thought I'd find the references at the end so useful!)


Arie said...

I do agree! Indeed the Factor docs are great, but as a Factor beginner I just seem to be unable to really get started straight away. What I see is a good reference for the intermediate to experienced user. However, a user guide and cookbook is needed. Perhaps it is a good idea to start building a great FAQ which later on will serve as the basis for a great Factor beginners book?
Keep up the good work!
Kind reagrds,

Unknown said...

I do have an FAQ at, but it's far from a general guide for beginners. I've written some introductory blog posts, which you can access at These are far from enough, though. I've noticed that nearly all serious Factor contributors hang out occasionally at, where you can ask any question you want about Factor (or other concatenative programming languages), and I'd strongly recommend that you go there.

People, please make your Blogger profiles public so I can contact you!

Kevin Marshall said...

Hey I'm just stumbling onto (or rather into) Factor today for the first time thanks to a mention from Zed Shaw...and like you, I'm always excited to find out about new languages and such so I'll probably spend the next few days/nights without sleep, trying to get a grasp on all that's already there and play around with it all.

Anyway, I agree that most programming books stink (even the one I helped recently write on Active Record kind of stinks in my opinion!)...but they are still better than having nothing to work from;

So that being said, you should DEF. write some (or all) of the book ideas you've suggestion is to just start writing a little bit a day for whatever one you want to tackle first; once you've got a few chapters/ideas together, it will be simple for you to get a publisher like Apress or O'Reilly to edit/print the book for you (I have a few contacts at those two that I would be happy to pass on to you)...since you are an early adapter and therefore high-level person in the language they would be thrilled to have you do a book for them I'm sure.

So keep up the great work and forge ahead...there will be lots of people like me ready to follow you down the path :-)

Unknown said...

Thanks a lot for the offer of connections and advice on getting started writing! But the problem isn't getting published but rather finding time to write, since I'm focusing most of my effort on school work and free time on blogging (which I want to keep up) and maintaining/improving the libraries I've written in Factor. Maybe, though, I'll start writing some of these books in my remaining spare time (though probably not the Unicode one, as I still know very little about that), but they won't be anywhere near done for a while.

Kevin Marshall said...

I hear you! Time is the new 'money', I find myself spending money all the time now just so I can have more 'time' to do the things I want to do...maybe you can do a little blurb on your blog once a day and then offline try and expand it even just a little more...then perhaps throw it up online as a wiki or an open-source book...eventually it will all get written and we'll all benefit from it!

Since I'm just getting started with Factor tonight (right now)...and even though I've been coding since about '93 or so, I haven't really dealt with any 'stack' languages before...I figure it's a good opp. for me to keep a running log as I stumble through it...with any luck it can maybe be a good 'getting started from scratch' type of doc for others down the road...

First though I've got to get myself going...I've downloaded Factor already and get the basic idea; I've been able to run some simple testing stuff like adding numbers and silly things like I just need to pick some type of 'project' to try and use as I figure it all out (I always learn a new language/program best when I pick a specific project/task to try and attempt and then start getting into the details of how to do it with the new language)...

On a complete side note - I was wondering if you've ever heard of Omnimark? It's an older, and not so popular, language that has strong roots in SGML and XML used to be free (I still have an old Windows based free version I can provide you) any case, it's awesome at parsing large files and such and it *might* give you some ideas (or confirmation) in some of the XML stuff you have been doing (forgive me if I'm off base, I am only just starting to read up on what you've been doing via your blog today but I got the sense you were heavy into XML and parsing stuff)...

BTW - you can email me in response if you prefer at info at - oh and my real name is Kevin Marshall if it matters to anyone :-)

Unknown said...

If you haven't seen it, this is not Unicode specific, but it does discuss some related issues, and is really an excellent reference work. Fonts and Encodings.

Unknown said...

Thanks for the pointer, Mike. I did see that book, but never looked at it as much before. I'll go somewhere on my reading list, which is really really long.

Anonymous said...