One is, I expanded my previous survey of algorithms for XML diffing, and the result is here [PDF].
I've been working on a few libraries in Factor. One is
extra/delegate, which now interacts properly with reloading. For example, if you define a protocol, then say that a class consults something for that protocol, and then redefine the protocol to include more generic words, the consulting class will be automatically updated. Unfortunately, this doubled the size of the code, give or take. Slava changed duplex-streams to use extra/delegate, and the code is much simpler now, as it previously amounted to manual delegation. I got rid of mimic because it's unsafe and violates encapsulation in unexpected ways.
Another little thing I made was
extra/lcs, a library for calculating Levenshtein distance between two strings, the longest common subsequence of two strings, and an edit script between two strings. Because the LCS problem and Levenshtein distance are duals, I was able to share most of the code between them. I used the simple quadratic time and space algorithm that Wikipedia describes rather than the better O(nd) algorithm [PDF] commonly called the GNU diff algorithm. I'll upgrade it to this once I understand the algorithm. This replaces
extra/levenshtein. I expected it to be significantly faster, because the old code used dynamically scoped variables and this uses statically scoped locals, but the speed improvement turned out to be just around 4% in small informal benchmarks on short strings.
Now, I'm working on the Unicode Collation Algorithm. The basics were simple, but I'm still unsure how to recognize collation graphemes efficiently in general. Either way, I discovered a bug in normalization: my insertion sort, used for canonical ordering of combining marks, wasn't a stable sort as required for normalization. It was actually an anti-stable sort: it reversed subsequences which were of the same sort key. That was really stupid of me. I'm going to work on incorporating existing test suites for things like this. For the test suite for collation, all but 8000 of 130,000 tests pass, making it far from ready.