Monday, November 5, 2007

The currying theorem

My favorite class in college right now is Introduction to Mathematical Structures, which explains the basics of naive set theory and theorem proving. Today, we were studying cardinal arithmetic when a startling proposition appeared on the board:

Is there a bijection h: {M -> {L -> K}} -> {(M × L) -> K}

The notation in this post is described at the end

Immediately, I shouted out "That's just like currying!" (I blogged about currying a little while ago.) My professor, always mildly annoyed when I shouted out such random things, said "what?" to which I mumbled "never mind."

Some set theory

Since my class uses only naive set theory, a simple definition of a set suffices: a set is something with one well-defined operation: a membership test. If A is a set, and a is something, then we can tell if a is a member A or a is not a member of A.

One thing you can build up using sets is a Cartesian product. I won't go into the details, but it's well-defined to have an ordered pair (a, b). With sets A and B, the Cartesian product of A and B is the set of all (a, b) for any a in A and b in B.

A function f: A -> B can be defined as a subset of this Cartesian product A × B, where f(a) = b if (a, b) is in f. There are additional requirements on the function f, though: it must be total (meaning, for each a in A, there is an (a, b) for some b in B) and it must be a function (meaning, for any a in A, there is only one b in B such that (a, b) is in f).

Two sets A and B are considered equinumerous if there is a one-to-one correspondence between them. This means, for each element a in A, there is exactly one b in B which can be paired up to it, and vice versa. We can put this one-to-one correspondence into a function, called a bijection. Since functions are just subsets of A × B, a bijective function is a set of ordered pairs, one for each element of A, pairing it with a unique element of B and covering all of the elements of B. If the two sets can be paired up like this, we can consider them the same size.

So, what kind of properties does a bijection f: A -> B have? For one, it has to be a function defined on all A, corresponding to an element of B. Another property is that no two elements of A correspond to the same element of B; this is called injectivity. A third property is that every element of B has to have at least one element of A corresponding to it; this is called surjectivity.

This is all a little complicated, so let's take a simple example. Let's show that the natural numbers N (this is the set containing 1, 2, 3, 4, and so on) are equinumerous to the even natural numbers E (2, 4, 6, 8, ...). If this is true, that means there are just as many natural numbers as even numbers, a very counterintuitive proposition. But there's a very simple function f: N -> E which demonstrates a pairing of them:

f(x) = 2x

All this does is pair a natural number to two times its value, which will be an even number. We can visualize f's pairing as the set containing (1, 2), (2, 4), (3, 6), (4, 8) and so on. To prove, mathematically, there are three things we need to show:

f is a total function: for any natural number n, there is obviously only one e such that e = 2n. And, obviously, for any n, it's well-defined to talk about 2n.

f is injective: This means that, for any e, there is no more than one n which corresponds to it. Mathematically, it's easy to show this by demonstrating, if f(n1) = f(n2), then n1 = n2. In this case, f(n1) = f(n2) implies 2⋅n1 = 2⋅n2, meaning that n1 = n2.

f is surjective: For every even number e, we need to demonstrate that there is a corresponding natural number n such that f(n) = e. This shows that f covers all of E. In this case, it's very easy to show that: all you need to do is divide an even number by two to get a natural number that, when doubled, yields that same even number.

What?

So then what does that have to do with currying? Let's look at that original proposition again:

Is there a bijection h: {M -> {L -> K}} -> {(M × L) -> K}

If this is true, it means that the set of functions {M -> {L -> K}} is equinumerous to the set of functions {(M × L) -> K}, which means, basically, that they are equivalent and represent each other.

Let's think about what these two sets represent. {M -> {L -> K}} is the set of functions from M to the set of functions from L to K. Wait a minute: that's just like a curried function! That's just like f(m)(l) = k: a function which returns a function to take the second argument. {(M × L) -> K} is the set of functions which take an ordered pair (m, l) and return an element of K, basically f(m, l) = k.

So, if this bijection h exists, it could be the uncurry function: it takes a curried function and makes a normal one out of it. Well, by what we know, there are other things it could do, but this is what would make the most sense. Let's define this uncurry function:

h(f)(m, l) = f(m)(l)

Now, is this a bijection? If it is, that means that curried functions are essentially equivalent to uncurried functions, meaning that it's valid to curry things. This is what makes it possible—easy, even—to have some languages be auto-currying (eg OCaml) and some languages not be (eg SML), and despite that use the same basic programming style. To prove that h is a bijection, we can use the same proof outline as the above demonstration that the even numbers are equinumerous to the natural numbers:

h is a total function: for any function f: M -> {L -> K}, there is only one g such that g(m, l) = f(m)(l); if there were another function which satisfied this property, it would equal g. And for any f: M -> {L -> K}, it is valid to call that function on m, where m is in M, and call that result on l, where l is in L.

h is injective: Another way we can show injectivity is by showing that if f ≠ g, then h(f) ≠ h(g). This is equivalent to what was shown before. Remember, f and g: M -> {L -> K}. So, if f ≠ g, then there is some m in M such that f(m) ≠ g(m). This implies that there is some l in L such that f(m)(l) ≠ g(m)(l). We know that f(m)(l) = h(f)(m, l), and that g(m)(l) = h(g)(m, l), by the definition of h. Therefore, h(f)(m, l) ≠ h(g)(m, l), meaning h(f) ≠ h(g). This demonstrates that h is injective: each different curried function is paired up with a different non-curried function.

h is surjective: For h to be surjective, there has to be a curried function corresponding to every uncurried function. Let's call the curried function f and the uncurried function u. If we have an uncurried function, we can easily pull the curried one out with the currying operation c:

c(u)(m)(l) = u(m, l)

So, for every uncurried function u, the corresponding curried function is c(u). If there is always an f such that h(f) = u for any u: (M × L) -> K, then h is surjective. Let's show that c(u) is that f. h(c(u))(m, l) = c(u)(m)(l) = u(m, l), so h(c(u)) = u. Since c(u) is defined for any u of the appropriate function type, h is surjective.

In case you lost track, we just proved that currying makes sense.

Exercise to the reader: Prove that the function h defined above is the only bijection between {M -> {L -> K}} and {(M × L) -> K} which works for any set M, L and K with no additional knowledge. Hint: I don't know how to do this, but I'm pretty sure it's true.

Cardinal arithmetic

When you get into all this complex mathematics, you can do something really cool: arithmetic. Let's redefine numbers completely, ignoring what we know already. A number, like 1, 2, 3, etc, is the set of all sets of that size. (Actually, scratch that—we can't really do that. Let's stipulate that these sets are subsets of some unspecified, but really big, U, and then keep going as if nothing ever happened. This helps us avert a gigantic contradiction.) More specifically, 1 is the set of sets which are equinumerous (remember, bijection) to the set containing just 1; 2 is the set of sets which are equinumerous to the set containing 1 and 2; etc. We use the notation |A| to refer to the cardinality, or this weird number system's value for the size, of A. So, if A contains the numbers 42, π, 1984 and nothing else, |A| = 3 since there is a bijection between A and the set containing 1, 2, 3, as well as a whole bunch of other sets: they all have the same number of elements.

It's easy to build up some basic operations for these cardinal numbers. If sets A and B are disjoint, |A| + |B| = |the set of anything in either A or B, or both|. For any set A and B, |A|⋅|B| = |A × B|. For any set A and B, |A||B| = |{B -> A}|. It's useful to imagine these with some small finite sets to verify that they make sense.

Now, we can show a whole bunch of basic identities we already knew about positive integers make sense here over all cardinal numbers. Here's one: for any cardinal numbers κ, λ, and μ: (κλ)μ = κλ⋅μ. So, what's the equivalent statement when translated into a fact about sets?

How does this look in programming?

Going back to the practical(ish) side, currying and uncurrying actually comes in handy when programming Haskell. Here's an exerpt from the Haskell Prelude:
`-- curry converts an uncurried function to a curried function;-- uncurry converts a curried function to a function on pairs.curry            :: ((a, b) -> c) -> a -> b -> ccurry f x y      =  f (x, y)uncurry          :: (a -> b -> c) -> ((a, b) -> c)uncurry f p      =  f (fst p) (snd p)`

As an example of the use of uncurry, here's an implementation of `zipWith` (not from the prelude) which doesn't duplicate the logic of `map`, assuming an existing implementation of `zip`:
`zipWith :: (a->b->c) -> [a]->[b]->[c]zipWith f a b = map (uncurry f) \$ zip a b`

Overview of notation
• Capital letters (eg K, L, M) are sets.
• Lower case letters are functions (eg f, g, h) or members of sets (k, l, m).
• Greek lowercase letters (eg κ, λ, μ) are cardinal numbers.
• A × B is the Cartesian product of A and B.
• f: A -> B means f is a total function from A to B.
• {A -> B} is the set of functions from A to B. (I made this notation up)

Maybe I'll never be the Good Math, Bad Math guy, but I still enjoy the chance to explain stuff. Maybe this will make up for the stupidity of my last post.

1 comment:

Anonymous said...

Thanks for the explanation. I had to find a bijection in the reverse direction but your post was really helpful. I thought it looked like currying as well but couldn't come up with the proper proof.