I hear what you're saying. "That's not cool; that sounds really boring and technical. Also, why do you need a generic protocol in the first place." To your first objection: remember when you first learned how to implement
map
on linked lists? Wasn't that so simple, yet useful and interesting? But then you realized that thousands of other beginning programmers wrote the exact same snippet of Scheme, and yours was pretty much useless. Well, I'm writing all of those things again, except in a way that no one else (to my knowledge) has done before. More on that later.The reason it's useful is a bit more obvious. Imagine you have a piece of code which uses a hashtable as an associative array. Then, after writing 1000 lines of code, you realize (oops) that, fairly often, you have to iterate over your data in alphabetical order. Also, the data is all in strings, and there are many strings starting with the same word. A hashtable works, but it's definitely not the best option. So trie to the rescue! But now you have to go back and change all of your code from using the hash library to the trie library. It shouldn't be so much work to do that. With a generic protocol, all you have to change is the code initializing the trie/hashtable to use the appropriate type. As another example, imagine you're implementing a database. I have no idea how databases are implemented, but let's imagine you use an AVL tree if there are 1024 or fewer entries, and if there are more, you use a B-tree to represent the data. This way, the switch can be made internally without other functions having to care. A third use, probably the most immediate benefit, is giving the Factor core a really solid set of words for manipulating alists without polluting the namespace, complete with all the derived operations you could want.
Well, those may be a little contrived, but back to the fun part, doing
map
all over again! Well, here's the current incarnation of the protocol that all assocs (our abbreviation for associative array) use:
GENERIC: assoc* ( key assoc -- value/f ? )
GENERIC: set-assoc ( value key assoc -- )
GENERIC: key? ( key assoc -- ? )
GENERIC: new-assoc ( size exemplar -- newassoc )
G: assoc-find ( assoc quot -- key value ? )
1 standard-combination ; inline ! quot: key value -- ?
GENERIC: remove-assoc ( key assoc -- )
GENERIC: clear-assoc ( assoc -- )
GENERIC: assoc-size ( assoc -- n )
! Additionally, clone must be implemented properly
Most of those have a clear analogue in the current hashtable implementation.
key?
is hash-member?
and H{ } new-assoc
is <hashtable>
. The only really different operation is assoc-find
, which at first looks like a rather curious choice for the protocol.Probably the most confusing part of
assoc-find
is that it doesn't use the standard GENERIC:
declaration, instead using G: assoc-find 1 standard-combination ;
. The purpose of this is to make it dispatch off the second item on the stack instead of the first one. G: some-generic-word 0 standard-combination ;
is equivalent to the normal GENERIC: some-generic-word
.The idea of
assoc-find
is to be analogous to find
. It traverses through the assoc in order (if an order exists), returning the first key/value pair that the given predicate succeeds on. It also returns a boolean of whether or not anything was found, which is necessary in the unlikely event was a key of f
and a value of f
. This probably seems strange because there currently is no hash-find
in the Factor library, and nothing has needed it. But on top of assoc-find
, any other useful combinator can easily and efficiently be built.Think about it this way: what combinator could be used that would give you the following features:
- The ability to execute a quotation on each key/value pair of an assoc (needed for
assoc-each
) - The ability to test, in a short-circuited way, if any or all quotations succeed on a particular predicate (needed for
assoc-all?
) - The ability to return the key/value pair associated with the success of a predicate (well, this is the exact function of
assoc-find
in the first place, but it's needed forrassoc
)
To all these problems,
assoc-find
is the most natural solution. The strategy for implementing assoc-each
? Just call the quotation given, and then return f to assoc-find
so it keeps going. At the end, drop all three outputs. Here's the implementation:
: assoc-with 2swap [ >r -rot r> call ] 2keep ; inline
: assoc-find-with ( obj assoc quot -- key value ? )
swap [ assoc-with rot ] assoc-find
>r >r 2nip r> r> ; inline
: assoc-each ( assoc quot -- )
swap [ rot call f ] assoc-find-with 3drop ;
(Yes, I did steal
assoc-with
from the existing hash-with
code verbatim.)Inspired by the sequences protocol that already exists in Factor, in this style, I can implement all other operations just once and have them work on all other types of assocs. I'm unaware of any similarly designed system. In Java, there's a
Map
interface, but since interfaces and abstract classes are distinct in Java, and Map
is not an abstract class, there are no shared method bodies between Map
implementations. In Ruby and Python, you can overload the []
or __getitem__
methods (and many other languages have similar operator overloadings) but there's no organized protocol or shared code around this. Common Lisp uses different functions for different data structures. The two I'm still wondering about are Smalltalk and Haskell, because an assoc superclass or typeclass would make a huge amount of sense in those languages especially, though the Haskell one would be limited to some degree. If you know anything about these or other things, I'd be really interested to hear your comments. Even if another language as a similar assoc protocol, I'd be surprised if they made the same weird choice to use assoc-find
as their fundamental traversal operation.Another thing I'm wondering about is the possibility of "virtual assocs," analogous to the virtual sequences already present in Factor. The basic idea is, to have a sequence, all you really need are
nth
and length
operations, maybe also a set-nth
. This allows for cool things like, say, a reversed
virtual sequence, which merely modifies indexing so that you are looking from the end instead of the beginning. There is no need to actually copy the underlying sequence. So why not do the same for assocs? It'd definitely be possible, but I don't know if it would ever be useful. I can't really think of a use, though it'd always be easy to implement one once someone thinks of one.Anyway, this isn't much of a new idea, just a recombination of a bunch of old ones. It's really a great example of the benefits of generic words. It's interesting to note that this would be just as easy to implement in a class-based system as it would be in a typeclass system, though for initial rough development, it's always easier to use generic words or typeclasses which can be defined after the fact.
Update: It turns out that, in Haskell, a library called Edison implements the same sort of thing that I've done with type classes. It's made by (who else?) Chris Okasaki, author of Purely Functional Data Structures. Unfortunately, I can't fully understand either that book or Edison (at least from the associated documentation). I'm sure the library makes mine look extremely crude by comparison, but I think it's a good thing to have your library be relatively simple and comprehensible to someone who doesn't care about a typeclass hierarchy, just about a couple functions. At the same time, I see how the type classes make sense to allow for fine-grained functionality differences.
No comments:
Post a Comment