Tuesday, April 24, 2007

Thoughts on Church numerals

I've been thinking a bit about the lambda calculus and combinatory logic. This is part of an attempt to create an explicit translation from some set of concatenative combinators to either one of those two. If I can do this, I know that the concatenative combinator set is Turing-complete.

If you've been following this blog, you're probably well-aware of the Church-Turing thesis. Wikipedia defines this as "Every function which would naturally be regarded as computable can be computed by a Turing machine" (or, equivalently, lambda calculus or any number of a large set of machines proven to be equivalent to Turing machines). Of course, this is unprovable, since we can't construct the set of computable functions, since that would require a solution to the halting problem. (No, an oracle for the halting problem wouldn't help here one bit.)

There are a number of variants on this, one of which is the Strong Church-Turing Thesis. This, in part, states that anything which can be efficiently calculated (in polynomial time) can be efficiently calculated on a Turing machine, or, again, in lambda calculus. Quantum computers, if possible and if BPP < BQP, may prove this wrong. But in a world of computers run by mostly classical physics, this holds true.

So, jump to Lambda Calculus 101. Look, here's a cool little lambda! It can do everything! See, it can represent pairs, booleans, and even numbers! For numbers, let's use these Church numbers; woohoo, they're so simple, just use function composition! As Wikipedia explains:

0 ≡ λf.λx. x
1 ≡ λf.λx. f x
2 ≡ λf.λx. f (f x)
3 ≡ λf.λx. f (f (f x))

Etcetera. The basic operations aren't too hard, either (except for the predecessor function, which is really weird). They just take a bit of thought:

plus ≡ λm.λn.λf.λx. m f (n f x)
succ ≡ λn.λf.λx. f (n f x)
mult ≡ λm.λn.λf. n (m f)
exp ≡ λm.λn. n m
pred ≡ λn.λf.λx. n (λg.λh. h (g f)) (λu. x) (λu. u)

See the article Church encoding for more explanation. If you haven't seen this before, it probably doesn't make any sense. Another good explanation, outside of Wikipedia's, is available here [PDF] by Robert Cartwright. (Only the first three pages are actually about the numerals, but the rest are also interesting.)

Taking addition as the simplest example, you might notice something a little fishy about the computational complexity. The addition of two Church numerals has the complexity O(N+M) where M and M are the values of the two numbers. In the computer you're using right now, the complexity of the algorithm in use is actually O(n+m) where n and m are the number of digits of each of the input numbers. (In practice, this is thought of as a constant time operation, but that's appropriate only if the numbers involved are reasonably small, for example in the range 0-2^32.) This difference may sound trivial, but it's not. If the Church numeral addition algorithm were held up to the same standard (where n and m represent the number of digits in the base 2 representation of the number) the complexity is actually O(2^(n+m)). It has exponential complexity, so with large numbers, it will take an extremely large number of steps. Try to write an RSA implementation using Church numerals, and you'll see how slow it runs.

But what about the Strong Church-Turing Thesis? Shouldn't it be possible to have a polynomial-time algorithm for this operation, since it's possible within another system? Of course it is! Church came before people were even thinking about complexity, and it was mostly irrelevant to the point of constructing Church numerals, so we can't really fault him for not having come up with a more efficient system. Probably the reason no one else (as far as I can tell) has come up with this is that it's too obvious (and useless) once you understand lambda calculus.

Probably the simplest basic strategy for this is to represent numbers as a linked list of true and false (representing 1 and 0), starting at the least significant digit. Without too much difficulty, lists and booleans can be represented in lambda calculus, though it'd be really, really annoying to actually construct the addition operation.

But I was bored in chem and bio today, so here it is, in full untested glory, an efficient implementation of addition in lambda calculus. It probably wouldn't be good enough for implementing RSA, but at least it's asymptotically optimal (I think).

true ≡ λt.λf. t
false ≡ λt.λf. f
or ≡ λa.λb. a true b
and ≡ λa.λb. a b false
not ≡ λa.a false true

cons ≡ λcar.λcdr.λpair.λnone. pair car cdr
nil ≡ λpair.λnone. none

0 ≡ nil
1 ≡ cons true nil
2 ≡ cons true (cons false nil)
3 ≡ cons true (cons true nil)
4 ≡ cons true (cons false (cons false nil))

Y ≡ λf.(λx.f (x x)) (λx.f (x x))
succ ≡ Y λsu. λn.
n (λcar.λcdr. car
(cons false (su cdr))
(cons true cdr)) 1)
addbool ≡ λnum.λbool. bool (succ num) num

horn' ≡ λa.λb.λc. and a (not (or b c))
horn ≡ λa.λb.λc.
or (and a (and b c))
(or (horn' a b c)
(or (horn' b a c) (horn' c a b)))
anytwo ≡ λa.λb.λc.
or (and a b)
(or (and b c) (and a c))

add ≡ (Y λadd. λcarry.λa.λb.
a (λcara.λcarb.
b (λcarb.λcdrb.
cons (horn cara carb carry)
(add (anytwo cara carb carry) cdra cdrb))
(addbool carry a))
(addbool carry b)
) false

Have fun with this. For more lambda experiments, see Oleg's lambda page. (For those who don't know, Oleg is crazy genius Haskell/Scheme programmer god. He implemented lambda calculus in hygenic Scheme macros in contuation-passing style.) These things have interesting mathematical properties, unlike my code above, which is just confusing. At least it's an example of the Strong Church-Turing Thesis at work, albeit not a very good one. If you have any questions, or for some reason test it and notice it doesn't work, or have anything else to say, please comment.

Sunday, April 22, 2007

Is Factor Turing-complete?

A rational person might respond, "Of course it's Turing complete, what an idiotic question. Why don't you go work on Unicode again?" Ignoring the second complaint, I think it's actually an interesting question. There's been a lot of theoretical writings about Joy mostly by Manfred von Thun and now about Cat by Christopher Diggins, but so far no rigorous definition of semantics. I'd like to change this.

As I see it, there are two ways to look at concatenative semantics. The first way is more widely theorized about. In both, there is a set of words and a set of possible data within the type system. In most analyses dealing with Joy (perhaps the finest of which is A Theory of Concatenative Conbinators by Brent Kerby), quotations are seen as lists of words, data and other quotations which can freely be consed or unconsed. This exposes the full power of most concatenative languages over applicative languages or concatenative languages without quotations, like Forth. But, of course, it doesn't make them any more Turing-complete. Using this strategy, Kerby found that two combinators called k and cake could do everything--but everything was undefined.

The other way of looking at it--the dark side, perhaps--is second-class quotations without introspection which can't be included in data structures. It can only be passed around on the stack. This, to some degree, resembles the compilable subset of Factor, since it is easier to compile quotations that are known at compiletime. The simplest minimal set of combinators for this would be over, dip and drop (called zap in A Theory of Concatenative Combinators and pop in Joy). I'm not sure, but I think that to be Turing-complete, you'd need a little more. Conditionals ([ drop ] is false and [ nip ] is true) and recursion (dup call) are possible with these three words, which might make it complete. But if not, then four more words for manipulation of lists could be added: ?, cons, uncons and f. Lists of lists of lists of f can represent anything! But are these needed? I'm not sure. Pi-calculus is equivalent to lambda calculus, as proven by Robin Milner in something I can't fully understand, despite the fact that there's no obvious way to represent data or even conditionals (is it sending data to different channels? I'm not sure) there.

Anyway, no matter which you choose, there are some interesting reduction semantics that you can attach to concatenative languages that lack side effects like these. I got the idea from Robbert van Dalen's Enchilada (that's rapido from #concatenative). Traditionally, the semantics of stack languages have been described in terms of each word being a function from [a tuple of] stack[s] to [a tuple of] stack[s]. But who needs stacks? They're an implementation detail! Let's just apply successive reductions to a quotation like all those fancy professors in lambda and pi calculus do! We don't even need to care about evaluation order; we can leave it undefined, or define other orders beyond the traditional front-to-back don't-evaluate-inside-quotations order. I haven't fully thought this through, though. Aside from the pesky questions of halting and side effects, evaluation order doesn't matter because the composition operation is associative (it forms a monoid, as Manfred von Thun explained).

Just to make things interesting, I'm gonna call dip U (for under), drop P (for pop) and over O. Below is the definition of a word N which gives the not of a Church boolean. Afterwards, I show the reduction of it, and the reduction rules for finding it. Note that the below definitions do not define fixed points, which would complicate semantics, but merely abbreviations.


-- Abstract syntax: --
Word := [Term] | O | D | U
Term := W*

-- Reduction rules: --
Where q, r, s range over quotations (data)
and A ranges over actions (by which I mean sequences of words and quotations)
The following rules may be implied in any order within a term

q r O
------
q r q

q D
---

q [A] U
-------
A q

Apply these rules until no further reduction is possible

-- A few derived combinators: --
D = []O[P]U -- dup
C = DUP -- call
I = [P]U -- nip
S = O[I]U -- swap
R = [S]US -- rot
T = [P] -- true
F = [I] -- false
N = FTRC -- not

-- Hey, why bother with all of those helpers? All at once now! --
-- For clarity, I'll substitute things incrementally --
D = []O[P]U -- dup
C = []O[P]UUP -- call
I = [P]U -- nip
S = O[[P]U]U -- swap
R = [O[[P]U]U]U O[[P]U]U -- rot
T = [P] -- true
F = [[P]U] -- false
N = [[P]U] [P] [O[[P]U]U]U O[[P]U]U []O[P]UUP -- not


-- A simple example: --
For any two quotations q and r:

q r S
q r O[[P]U]U
q r q [[P]U]U
q r [P]U q
q P r q
r q

Thus we have proven that swap does what we want it to. For rot, we have

q r s R
q r s [S]US
q r S s S
r q s S
r s q

which is the behavior we wanted. Similarly, for C (call) on action A we have

[A]C
[A][]O[P]UUP
[A][][A][P]UUP
[A][]P[A]UP
[A][A]UP
A[A]P
A

These can be used in the reduction below.

-- The actual reduction: --
(when applied to [P] which is true, so the result should be [[P]U] or false)

First, using previously examined words:
[P] N
[P] [[P]U] [P] R C
[[P]U] [P] [P] C
[[P]U] [P] P
[[P]U]

Now, all at once, in just 15 steps!
[P] [[P]U] O[[P]U]U [P] O[[P]U]U []O[P]UUP
[P] [[P]U] [P] [[P]U]U [P] O[[P]U]U []O[P]UUP
[P] [[P]U] [P]U [P] [P] O[[P]U]U []O[P]UUP
[P] P [[P]U] [P] [P] O[[P]U]U []O[P]UUP
[[P]U] [P] [P] O[[P]U]U []O[P]UUP
[[P]U] [P] [P] [P] [[P]U]U []O[P]UUP
[[P]U] [P] [P] [P]U [P] []O[P]UUP
[[P]U] [P] P [P] [P] []O[P]UUP
[[P]U] [P] [P] []O[P]UUP
[[P]U] [P] [P] [] [P] [P]UUP
[[P]U] [P] [P] [] P [P] UP
[[P]U] [P] [P] [P] UP
[[P]U] [P] P [P] P
[[P]U] [P] P
[[P]U]


Have fun deciphering all that! There are three things that I want to prove in this system. The first is trivial: for any algorithm, there is an infinite number of programs which achieve the same reduction (this can be done by putting []P wherever you feel like, most easily at the end). The second thing is almost trivial: these three combinators can produce not only any permutation, but also any arbitrary duplication or deletion moved around to anywhere, for example abcd--dcaca. This can be done first by showing that these transformations can be represented first by a series of duplications and deletions particular points, and then rotations between certain elements and the top. Then, it can be shown that those operations can be implemented with only O, P and U. The third and final proof would be non-trivial: show that O, P and U are Turing-complete. I'm not even sure whether they are or not, but I suspect they are. An injection from SK-calculus or lambda to this would probably be the simplest way to prove this. This is harder than it sounds, as SK-calculus is usually thought of containing function-values created at "runtime" (eg K x evaluates to \_.x) while no such direct equivalent exists in {O,P,U}.

Of course, it'd be interesting to do this with cake and k too. But I haven't thought about that as much. There's a translation from a sort of "lambda" defined on that in A Theory of Concatenative Combinators, but it's not very well-defined, and may not be a general enough translation to the base of {i, dip, cons, dup, zap} to prove that it's Turing-complete. I hope Kerby works on this more in the future.

Update: In an earlier version, I had D = O[P]U instead of D = []O[P]U, which was incredibly stupid and caused me all kinds of problems. (It all reduced to nothing.) Now it's fixed, and the explanation is clearer, hopefully. Also, I added the potential problems at the end.

Saturday, April 14, 2007

More on assocs

So, assocs are done! I had a few problems, but it ultimately worked out. I think they'll be included in .89, but I can't say for sure; maybe .90. The final protocol ended up being a little bit different than the last time I wrote about them. Currently, these are the words that should be implemented:

GENERIC: at* ( key assoc -- value/f ? )
GENERIC: set-at ( value key assoc -- )
GENERIC: new-assoc ( size exemplar -- newassoc )
G: assoc-find ( assoc quot -- key value ? )
1 standard-combination ; inline ! quot: key value -- ?
GENERIC: delete-at ( key assoc -- )
GENERIC: clear-assoc ( assoc -- )
GENERIC: assoc-size ( assoc -- n )
GENERIC: assoc-like ( assoc exemplar -- newassoc )
! Additionally, clone should be implemented properly

The major changes from the protocol of a few days ago are the addition of assoc-like (which will increase efficiency in some cases) and the removal of key? (which can easily be, and is, implemented as at* nip). Additionally, I did some renaming: assoc* => at*, set-assoc => set-at and remove-assoc => delete-at.

It was a bit annoying to deal with the Factor internals, but 90% of the problems ended up being my fault anyway. I changed all the code in the Factor distribution (except for Ed's, becuase I heard he would want to do it himself. But if you're reading this, Ed, I'd be fine with changing it if you want me to) and all the unit tests passed--except the ones that failed to begin with. But Factor programmers probably want to know what these changes mean, too. So the rest of this post will be a summary of those changes.

The first thing to know is vocabularies. I moved everything that didn't specifically have to do with hashtables into the assocs vocab. All of the remaining words have hash in the name except for associate (which makes a new hashtable) and ?set-hash (which might make a new hashtable). Except for ?set-at everything in that vocabulary is keeping the same name. Any conforming implementation of the assocs protocol can expect to have all of the other words in the assocs vocab work properly. And most of them correspond to old, pre-existing hashtable words; they just work on more datatypes now. Some names weren't changed at all, like cache, which is suitably generic already.

In the taxonomy of renaming, there are three groups: dropped prefixes, hash=>assoc changes, and hash=>at changes. These, together with an oddball change, are described below.







































Old wordNew word(s)
Dropped prefixes
hash-updateupdate
hash-intersectintersect
hash-unionunion
hash-keyskeys
hash-valuesvalues
hash => at
hashat
hash*at*
?hash?at
?hash*?at*
set-hashset-at
change-hashchange-at
hash+at+
remove-hashdelete-at
?remove-hash?delete-at
hash-all-with?assoc-all-with?
hash => assoc
hash-eachassoc-each
hash-mapassoc-map
hash-each-withassoc-each-with
hash-all?assoc-all?
hash-all-with?assoc-all-with?
hash-subsetassoc-subset
hash-subset-withassoc-subset-with
hash-empty?assoc-empty?
hash-stackassoc-stack
clear-hashclear-assoc
hashtable=assoc=
subhash?subassoc?
map>hashH{ } map>assoc
make-hashH{ } make-assoc
Oddballs
hash-member?key?
hash>alist>alist
alist>hash>hashtable


Of course, there's a difference: every single one of these words is more general and will work on all assocs, not just hashtables. If you have any questions or problems about this change, feel free to ask.

Tuesday, April 10, 2007

Why do we need data structures?

Right now, I'm facing a particularly annoying and indecipherable bug in integrating the assocs protocol with the rest of the Factor system (I have no idea what it is; it just doesn't work), so I decided to write another one of those stupid essays you all hate.

A couple nights ago, I was trying to explain to my mom what I was programming, the associative array protocol. I explained it badly, but a better way to put it would be this: imagine you're computerizing a library. The main thing you have to do is make it so that people can look up the call number of books. Everyone already knows the title and correct spelling of the book they want to get. Your job is to write a program that, given the title, will look up the call number for you. It has to do this quickly, even if there are a lot of books in the library. Of course, to do this, you need an associative array. This is obvious to anyone who's done much programming. I'm sure most of you already have your favorite structure picked out.

But why do we need to structure the data, anyway? A programmer might say, "to allow data to be accessed with a minimal asymptotic complexity." Basically, that means in as few steps as possible, even if there's a lot of data. Still, this doesn't completely mesh with how computers have been explained to the average person. It probably goes something like this. (This is about as far as the Magic School Bus gets, when the class goes into a computer)

First, input is given to the computer. The input is first in the RAM. It is in ones and zeroes, because that's easier for the computer to handle. To process the data, some ones and zeroes are sent to the CPU, together with some instructions, which write other ones and zeroes back to the RAM. The ones and zeroes represent numbers in base 2. Then, when you want to save the data, those ones and zeroes are taken from the RAM and written to the hard drive.

Then, many people understand the higher level stuff about computers: the Sony VAIO is a good laptop, this is how you use Microsoft Word, Mac OS X is based on Unix, etc.

In the middle, there's a huge gap of information that only specialists know about. Right above the bottom of that gap sit data structures. Data structures, along with file formats, are the way those ones and zeroes are organized. If we only had numbers, we really couldn't do much with computers.

The basis for all data structures is the pointer. Basically, every single location in the RAM has a number describing where it is. A pointer is a number that points to one of those positions in RAM. So the simplest thing you can do is say that a particular pointer doesn't just point to one number; it points to a number, and then the number right after it in RAM. And why limit yourself to just one? Why not 100? This pointer, the pointer to location 42313, represents the number at all of the locations 42313-42412. To get the 53rd item in the array, you just look up what number is at the pointer 42313+53 (counting starts at 0, not 1). This whole thing is called an array.

Data structures get much more complicated, but the idea remains the same: you have numbers, and then you have pointers to numbers, and even pointers to pointers to numbers. Why so many pointers and so many ways of organizing them? Because if you have the right pointers, they take you to the right number, the data that you want. A good deal of computer scientists' work has been devoted to getting just the right way to orient pointers, and counting exactly how many steps they take to find the data.

Saturday, April 7, 2007

Assoc protocol

Still in my break from Unicode (I'll get back to it eventually...) I'm working on something really cool right now: a generic protocol for associative arrays! [For non-Factorers, a protocol is our word for an informal collection of generic words that, together, constitute something that is useful for a class to implement.]

I hear what you're saying. "That's not cool; that sounds really boring and technical. Also, why do you need a generic protocol in the first place." To your first objection: remember when you first learned how to implement map on linked lists? Wasn't that so simple, yet useful and interesting? But then you realized that thousands of other beginning programmers wrote the exact same snippet of Scheme, and yours was pretty much useless. Well, I'm writing all of those things again, except in a way that no one else (to my knowledge) has done before. More on that later.

The reason it's useful is a bit more obvious. Imagine you have a piece of code which uses a hashtable as an associative array. Then, after writing 1000 lines of code, you realize (oops) that, fairly often, you have to iterate over your data in alphabetical order. Also, the data is all in strings, and there are many strings starting with the same word. A hashtable works, but it's definitely not the best option. So trie to the rescue! But now you have to go back and change all of your code from using the hash library to the trie library. It shouldn't be so much work to do that. With a generic protocol, all you have to change is the code initializing the trie/hashtable to use the appropriate type. As another example, imagine you're implementing a database. I have no idea how databases are implemented, but let's imagine you use an AVL tree if there are 1024 or fewer entries, and if there are more, you use a B-tree to represent the data. This way, the switch can be made internally without other functions having to care. A third use, probably the most immediate benefit, is giving the Factor core a really solid set of words for manipulating alists without polluting the namespace, complete with all the derived operations you could want.

Well, those may be a little contrived, but back to the fun part, doing map all over again! Well, here's the current incarnation of the protocol that all assocs (our abbreviation for associative array) use:

GENERIC: assoc* ( key assoc -- value/f ? )
GENERIC: set-assoc ( value key assoc -- )
GENERIC: key? ( key assoc -- ? )
GENERIC: new-assoc ( size exemplar -- newassoc )
G: assoc-find ( assoc quot -- key value ? )
1 standard-combination ; inline ! quot: key value -- ?
GENERIC: remove-assoc ( key assoc -- )
GENERIC: clear-assoc ( assoc -- )
GENERIC: assoc-size ( assoc -- n )
! Additionally, clone must be implemented properly

Most of those have a clear analogue in the current hashtable implementation. key? is hash-member? and H{ } new-assoc is <hashtable>. The only really different operation is assoc-find, which at first looks like a rather curious choice for the protocol.

Probably the most confusing part of assoc-find is that it doesn't use the standard GENERIC: declaration, instead using G: assoc-find 1 standard-combination ;. The purpose of this is to make it dispatch off the second item on the stack instead of the first one. G: some-generic-word 0 standard-combination ; is equivalent to the normal GENERIC: some-generic-word.

The idea of assoc-find is to be analogous to find. It traverses through the assoc in order (if an order exists), returning the first key/value pair that the given predicate succeeds on. It also returns a boolean of whether or not anything was found, which is necessary in the unlikely event was a key of f and a value of f. This probably seems strange because there currently is no hash-find in the Factor library, and nothing has needed it. But on top of assoc-find, any other useful combinator can easily and efficiently be built.

Think about it this way: what combinator could be used that would give you the following features:

  • The ability to execute a quotation on each key/value pair of an assoc (needed for assoc-each)

  • The ability to test, in a short-circuited way, if any or all quotations succeed on a particular predicate (needed for assoc-all?)

  • The ability to return the key/value pair associated with the success of a predicate (well, this is the exact function of assoc-find in the first place, but it's needed for rassoc)


To all these problems, assoc-find is the most natural solution. The strategy for implementing assoc-each? Just call the quotation given, and then return f to assoc-find so it keeps going. At the end, drop all three outputs. Here's the implementation:

: assoc-with 2swap [ >r -rot r> call ] 2keep ; inline

: assoc-find-with ( obj assoc quot -- key value ? )
swap [ assoc-with rot ] assoc-find
>r >r 2nip r> r> ; inline

: assoc-each ( assoc quot -- )
swap [ rot call f ] assoc-find-with 3drop ;

(Yes, I did steal assoc-with from the existing hash-with code verbatim.)

Inspired by the sequences protocol that already exists in Factor, in this style, I can implement all other operations just once and have them work on all other types of assocs. I'm unaware of any similarly designed system. In Java, there's a Map interface, but since interfaces and abstract classes are distinct in Java, and Map is not an abstract class, there are no shared method bodies between Map implementations. In Ruby and Python, you can overload the [] or __getitem__ methods (and many other languages have similar operator overloadings) but there's no organized protocol or shared code around this. Common Lisp uses different functions for different data structures. The two I'm still wondering about are Smalltalk and Haskell, because an assoc superclass or typeclass would make a huge amount of sense in those languages especially, though the Haskell one would be limited to some degree. If you know anything about these or other things, I'd be really interested to hear your comments. Even if another language as a similar assoc protocol, I'd be surprised if they made the same weird choice to use assoc-find as their fundamental traversal operation.

Another thing I'm wondering about is the possibility of "virtual assocs," analogous to the virtual sequences already present in Factor. The basic idea is, to have a sequence, all you really need are nth and length operations, maybe also a set-nth. This allows for cool things like, say, a reversed virtual sequence, which merely modifies indexing so that you are looking from the end instead of the beginning. There is no need to actually copy the underlying sequence. So why not do the same for assocs? It'd definitely be possible, but I don't know if it would ever be useful. I can't really think of a use, though it'd always be easy to implement one once someone thinks of one.

Anyway, this isn't much of a new idea, just a recombination of a bunch of old ones. It's really a great example of the benefits of generic words. It's interesting to note that this would be just as easy to implement in a class-based system as it would be in a typeclass system, though for initial rough development, it's always easier to use generic words or typeclasses which can be defined after the fact.

Update: It turns out that, in Haskell, a library called Edison implements the same sort of thing that I've done with type classes. It's made by (who else?) Chris Okasaki, author of Purely Functional Data Structures. Unfortunately, I can't fully understand either that book or Edison (at least from the associated documentation). I'm sure the library makes mine look extremely crude by comparison, but I think it's a good thing to have your library be relatively simple and comprehensible to someone who doesn't care about a typeclass hierarchy, just about a couple functions. At the same time, I see how the type classes make sense to allow for fine-grained functionality differences.

Sunday, April 1, 2007

My awesome idea

Building on what I learned before about computational complexity (see previous blog post) I'm now making a great new device that could infer computational complexity! It'll be an Eclipse plugin; just press Ctrl+Shift+F7 while selecting a method and a message box pops up, showing, in big-O notation, how long it takes to run. To implement this, I'll need to use a lot of Java's metaprogramming techniques, but once I have that out of the way, it'll be simple. And it'll work for everything.

April fools!

Actually, this is fundamentally impossible to make. Why? The answer goes back to Alan Turing and the halting problem. The question of the halting problem is, is it possible to construct a program that can tell if another program will halt? Well, consider this program (in Wikipedia-style pseudocode):

function example()
if halts?(example) then
infinite-loop()
else
return nil

So, what happens when we run halts?(example)? Well, if this returns true, then example itself should go into an infinite loop, not halting. If it returns false, indicating that the example loops forever, then it in fact quickly returns nil and halts. So there's no real answer to whether this halts. And--it's not hard to see--there are (countably) infinitely many similar programs that exploit the same loophole.

When I first learned this, I thought, "This is cheating; that's too easy." Well, yes. That's higher level math for you. But there's another way of thinking about it, without that cheating device: to be able to tell if absolutely any piece of code halts, you need to know everything about that code and basically run it in some sort of emulator. There are cases where you can know it doesn't halt, but if it doesn't halt unexpectedly, and you're running the code... that doesn't work.

Building off of this, think about complexity: the first thing you need to know before assessing complexity is whether an algorithm halts. But we just showed that we can't do that! Also, what if the code does something like this:

function example2(n)
if complexity(example2) = O(n) then
return new integer[n][n] // this takes O(n^2)
else if complexity(example2) = O(n^2) then
return new integer[n] // this takes O(n)

and we want a tight bound for complexity.

Of course, specific solutions are possible, but these are both difficult to construct and highly limited in function. I hope you had fun with your daily dose of pessimism!