Useless Factor: math

Showing posts with label math. Show all posts

Monday, February 11, 2008

Entscheidungsproblem gelöst! New answer and FAQ

Alan Turing wasn't quite right: anything that can be executed in the physical universe runs in constant time, and it's simple to design a mechanism which tests if an algorithm will halt or not which runs in constant time (assuming it takes less than half of the universe to run).

Q: How can you make such a bold, stupid claim?
A: Because Turing assumed an infinite space to solve problems in. In the real world, there is a finite number of bits we can work with. Let's call this n. If a problem can be solved in the physical universe, it must be solvable using n or fewer bits of memory. There are 2ⁿ possible states the bits can be in, so after 2ⁿ memory writes, either an answer must be produced or the bits are in a state that they've been in exactly, an infinite loop. Since n is a constant, 2ⁿ is a constant, so we know everything runs in worst-case O(1) time. For a more practical simulation, take n to be the number of bits in addressable memory, including virtual memory.

Q: How can you check if an algorithm will halt or not in constant time?
A: Divide the available memory in two. Half will be used for solving the algorithm, and half will be a counter. Increment the counter on each memory write the algorithm memory space. If this counter overflows before the algorithm halts, we know there is an infinite loop. (Note: This can only solve things that take a maximum of half of the available memory, which is a smaller class of programs.)

Remember the big thing that Turing proved: we can't have a program which tests if other programs halt, because if we did have that program, and ran it on a program which halted if it didn't halt and didn't halt if it halted (by the first program's test), then the halting test program couldn't halt or there would be a contradiction. This implicitly rests on the idea that program memory is unbounded. To run this program (the second one) using the halting test method described above would require an unbounded amount of memory because it would result in an unbounded recursion. We can just say that the program crashes, since it runs out of memory. In this world, not everything that's computable can be computed, but everything halts (or repeats a previous memory state exactly, which can be caught, as explained above).

Q: What do you mean by unbounded? And halt?
A: By "unbounded" I mean countable (meaning isomorphic as a set to the naturals). Basically, there is no limit, but no element itself is infinite. By "halt" I mean that the algorithm will stop processing and come to an answer. In this theoretical world of algorithms, note that there's no addtional external input once the algorithm begins.

Q: But the system you describe isn't Turing-complete!
A: Yes. The finite universe isn't strictly Turing-complete, either, and I think this is interesting to note. For example, there's no way we could model another universe larger than our own, down to quantum states, from this universe.

Q: So my computer science professor was lying to me all along! Computational complexity is useless. Why didn't they tell you this?
A: Not quite. Since this constant is so large (even when we shrink it down to 2^{the size of the available memory}), we can actually get a tighter upper bound than O(1) if we use, say, O(m) for the same algorithm. This is why you sometimes have to be careful what your constants are in analysis; sometimes something which looks asymptotically faster is actually slower in not just small but all cases. Anyway, your professor probably didn't tell you this because it's both obvious and vacuous.

Q: So, if you were just tricking me, does this idea mean anything?
A: Actually, it comes up fairly often. Say you're doing arithmetic on integers. Each operation takes constant time, right? Well, it does if you're using machine integers. That's because machine integers are of a certain fixed maximum size. The reason we can say it takes constant time is the same reason that the universe can calculate everything in constant time! In fact, even if we use bignums, if we say "this is only going on in numbers less that can be represented in 256 bits," it still makes some sense to say that things go on in constant time. It's only when things are unbounded that it makes total sense. Sometimes you have to look at very large data points to find that, say, n^{log₄ 3} grows faster than n log² n. If the size of the input is bounded at 10 billion, there's no clear winner.

Think about sorting algorithms. If we have k possible items in an array of length n, we can sort the array in O(n) time and O(k) space using counting sort. Say we're sorting an array of machine integers, and we have no idea what the distribution is. Great! Just make an array of integers which has one counter for each machine integer. Now iterate through the list and increment the array at the integer's index when you encounter that integer. What's the problem? Oh yeah, we just used up more than all of the addressable memory. So just because we can, theoretically, construct something that will solve our problem in linear (or, with the original example) constant time doesn't mean it'll work.

Update: Added another question. Note to readers: The headline and first paragraph aren't to be taken seriously or directly! This is a joke to demonstrate a point, not a claim of mathematical discovery, and the original solution to the halting problem is an eternal mathematical truth.

Saturday, February 9, 2008

Factor[ial] 102

Previously, I wrote an introduction to Factor though Factorial called Factor[ial] 101. If you saw that, you probably thought that's all you'd see of that totally un-compelling example. Well, you're wrong! We can actually implement factorial in a more simple way and more efficiently with large integers.

Let's look at the solution from last time:


: factorial ( n -- n! )
    1 [ 1+ * ] reduce ;

When you saw this, you might have been thinking, "Why not just get a list of the numbers from 1 to n and do their product? Why bother with 1+? In Haskell, it's just product [1..n]." We can actually use this strategy in Factor using the math.ranges library, which has code a word called [1,b] which creates a virtual sequence containing the integers 1 through its argument. We can also use the word product in math.vectors to get the product of a sequence. So the new factorial is just


: factorial ( n -- n! )
    [1,b] product ;

Efficiency and bignums

I previously talked about how to make some simple mathematical functions work well with bignums. The example I used there was a string>number conversion procedure, but it applies equally to getting the product of a list. In short: when multiplying two bignums of size (in bits) n and m, we know there's a lower bound of Ω(nm), since a bignum of nm bits must be constructed. So if we go about finding the product of a sequence by starting with 1, then multiplying that by the first element, then the second, and so on for the entire sequence, then this will take Ω(n²) time where n is the size of the resulting product!

We can do better, and a better strategy is to use binary recursion similar to mergesort: split the sequence in half, find the products of both halves, and multiply them. Then the easy lower bound is lower: Ω(n log n). (Note: the upper bounds for these algorithms is something like the lower bound times log n, with a reasonably efficient multiplication algorithm.)

So here's a better implementation of product, using this strategy:


: halves ( seq -- beginning end )
    dup length 2 /i cut-slice ;

: product ( seq -- num )
    dup length {
        { 0 [ drop 1 ] }
        { 1 [ first ] }
        [ drop halves [ product ] bi@ * ]
    } case ;

Abstraction and combinators

If we want to implement a word to sum an array, we'll be repeating a lot of code. So let's abstract the basic idea of this kind of recursion into a combinator, or higher order function, that we can supply the starting value and combining function to. With this, we should be able to write


: product ( seq -- num ) 1 [ * ] split-reduce ;
: sum ( seq -- num ) 0 [ + ] split-reduce ;

where split-reduce is this new combinator. Now, it's no harder to write code using the binary-recursive strategy than the original naive strategy, if split-reduce is somewhere in the library. Here's how you can implement it:


: split-reduce ( seq start quot -- value )
    pick length {
        { 0 [ drop nip ] }
        { 1 [ 2drop first ] }
        [ drop [ halves ] 2dip [ [ split-reduce ] 2curry bi@ ] keep call ]
    } case ; inline

This looks a little messy, as combinators sometimes get. Let's see how it looks using local variables: (I'm using the locals vocab, which allows the syntax :: word-name ( input variables -- output ) code... ; for lexical scoping. I usually don't use this very often, but in implementing combinators it can make things much cleaner.)


:: split-reduce ( seq start quot -- seq' )
    seq empty? [ start ] [
        seq singleton? [ seq first ]
        [ seq halves [ start quot split-reduce ] bi@ quot call ] if
    ] if ; inline

Which one of these you prefer is purely a matter of taste.

A tangent to mergesort

What if we wanted to use split-reduce to implement mergesort? It might look like you can do this:


: mergesort ( seq -- sorted )
    { } [ merge ] split-reduce ;

However, there's a problem here: in the base case, if we have { 1 }, it'll be changed into 1. But we need the base case to output sequences! (Ignore the fact that 1 is a sequence; it's of the wrong type.) So the cleanest way to do this is to make a new word, split-process, which does the same thing as split-reduce but takes a new parameter specifying what to do in the base case. With this we're able to do


: split-reduce ( seq start quot -- value )
    [ first ] split-process ; inline

To implement this, we just need to modify split-reduce, factoring out the base case code:


:: split-process ( seq start quot base-quot -- seq' )
    seq empty? [ start ] [
        seq singleton? [ seq base-quot call ] [
            seq halves
            [ start quot base-quot split-process ] bi@
            quot call
        ] if
    ] if ; inline

Now mergesort can be implemented as


: mergesort ( seq -- sorted )
    { } [ merge ] [ ] split-process ;

for some suitable implementation of merge.

To binrec and beyond

What if we took this even further: why restrict this to binary recursion on sequences? We can do binary recursion on everything that needs binary recursion! So let's make a combinator out of this, calling it binrec. binrec takes four—four!—quotations. The first one specifies the termination (base case) condition. The second specifies what to do in the base case. The third specifies how to split up the data in the inductive case, and the fourth specifies how to put the two pieces back together after the recursion takes place. Here's how we can implement binrec for a totally general binary recursion combinator:


:: binrec ( data test end split rejoin -- value )
    data test call [ data end call ] [
        data split call
        [ test end split rejoin binrec ] bi@
        rejoin call
    ] if ; inline

In the abstract, this isn't too bad. But how can you read code that uses binrec? You have to remember four quotations, their intended stack effects and their role in calculating this. For me, this is too difficult to do in most cases.

Look at how we can define split-process in terms of binrec:


:: split-process ( seq start rejoin-quot end-quot -- value )
    [ dup singleton? swap empty? or ]
    [ dup singleton? [ end-quot call ] [ drop start ] if ]
    [ halves ] rejoin-quot binrec ; inline

This isn't actually easier than defining split-process directly, and you can argue that it's worse than the original version. Still, it provides an interesting way to avoid explicit recursion.

Pulling it all together

Complicated combinators like binrec can be useful, sometimes, as long as you don't use them directly. One of the great things about Factor is that it's so easy to specialize these things. So why not? Almost every case that you're using binrec follows a particular pattern.

We can tell everyone more loudly about split-reduce, which is much easier to use, and have binrec be hidden in the library for advanced users who want to implement their own similar combinators without repeating the code that's already written in binrec. It's not that recursion is difficult to do, its just that there's no reason to write this code more than once.

So that's how you implement factorial in Factor. Except once all this is in the library, all you have to worry about is [1,b] product.

(BTW If you actually want to use factorial for something practical, where it'll be called multiple times, a memoizing table-based approach might be faster. Or maybe Stirling's approximation is appropriate, depending on the use case. Or maybe one of these algorithms. But that's a topic for another blog post!)

Update: I fixed some typos, found by Slava. Also, Slava added split-reduce into the core as binary-reduce and implemented sum and product with it.

Update 2: Updated the code samples for the current version of Factor, as of late March 2009.

Monday, November 5, 2007

The currying theorem

My favorite class in college right now is Introduction to Mathematical Structures, which explains the basics of naive set theory and theorem proving. Today, we were studying cardinal arithmetic when a startling proposition appeared on the board:

Is there a bijection h: {M -> {L -> K}} -> {(M × L) -> K}

The notation in this post is described at the end

Immediately, I shouted out "That's just like currying!" (I blogged about currying a little while ago.) My professor, always mildly annoyed when I shouted out such random things, said "what?" to which I mumbled "never mind."

Some set theory

Since my class uses only naive set theory, a simple definition of a set suffices: a set is something with one well-defined operation: a membership test. If A is a set, and a is something, then we can tell if a is a member A or a is not a member of A.

One thing you can build up using sets is a Cartesian product. I won't go into the details, but it's well-defined to have an ordered pair (a, b). With sets A and B, the Cartesian product of A and B is the set of all (a, b) for any a in A and b in B.

A function f: A -> B can be defined as a subset of this Cartesian product A × B, where f(a) = b if (a, b) is in f. There are additional requirements on the function f, though: it must be total (meaning, for each a in A, there is an (a, b) for some b in B) and it must be a function (meaning, for any a in A, there is only one b in B such that (a, b) is in f).

Two sets A and B are considered equinumerous if there is a one-to-one correspondence between them. This means, for each element a in A, there is exactly one b in B which can be paired up to it, and vice versa. We can put this one-to-one correspondence into a function, called a bijection. Since functions are just subsets of A × B, a bijective function is a set of ordered pairs, one for each element of A, pairing it with a unique element of B and covering all of the elements of B. If the two sets can be paired up like this, we can consider them the same size.

So, what kind of properties does a bijection f: A -> B have? For one, it has to be a function defined on all A, corresponding to an element of B. Another property is that no two elements of A correspond to the same element of B; this is called injectivity. A third property is that every element of B has to have at least one element of A corresponding to it; this is called surjectivity.

This is all a little complicated, so let's take a simple example. Let's show that the natural numbers N (this is the set containing 1, 2, 3, 4, and so on) are equinumerous to the even natural numbers E (2, 4, 6, 8, ...). If this is true, that means there are just as many natural numbers as even numbers, a very counterintuitive proposition. But there's a very simple function f: N -> E which demonstrates a pairing of them:

f(x) = 2x

All this does is pair a natural number to two times its value, which will be an even number. We can visualize f's pairing as the set containing (1, 2), (2, 4), (3, 6), (4, 8) and so on. To prove, mathematically, there are three things we need to show:

f is a total function: for any natural number n, there is obviously only one e such that e = 2n. And, obviously, for any n, it's well-defined to talk about 2n.

f is injective: This means that, for any e, there is no more than one n which corresponds to it. Mathematically, it's easy to show this by demonstrating, if f(n₁) = f(n₂), then n₁ = n₂. In this case, f(n₁) = f(n₂) implies 2⋅n₁ = 2⋅n₂, meaning that n₁ = n₂.

f is surjective: For every even number e, we need to demonstrate that there is a corresponding natural number n such that f(n) = e. This shows that f covers all of E. In this case, it's very easy to show that: all you need to do is divide an even number by two to get a natural number that, when doubled, yields that same even number.

What?

So then what does that have to do with currying? Let's look at that original proposition again:

Is there a bijection h: {M -> {L -> K}} -> {(M × L) -> K}

If this is true, it means that the set of functions {M -> {L -> K}} is equinumerous to the set of functions {(M × L) -> K}, which means, basically, that they are equivalent and represent each other.

Let's think about what these two sets represent. {M -> {L -> K}} is the set of functions from M to the set of functions from L to K. Wait a minute: that's just like a curried function! That's just like f(m)(l) = k: a function which returns a function to take the second argument. {(M × L) -> K} is the set of functions which take an ordered pair (m, l) and return an element of K, basically f(m, l) = k.

So, if this bijection h exists, it could be the uncurry function: it takes a curried function and makes a normal one out of it. Well, by what we know, there are other things it could do, but this is what would make the most sense. Let's define this uncurry function:

h(f)(m, l) = f(m)(l)

Now, is this a bijection? If it is, that means that curried functions are essentially equivalent to uncurried functions, meaning that it's valid to curry things. This is what makes it possible—easy, even—to have some languages be auto-currying (eg OCaml) and some languages not be (eg SML), and despite that use the same basic programming style. To prove that h is a bijection, we can use the same proof outline as the above demonstration that the even numbers are equinumerous to the natural numbers:

h is a total function: for any function f: M -> {L -> K}, there is only one g such that g(m, l) = f(m)(l); if there were another function which satisfied this property, it would equal g. And for any f: M -> {L -> K}, it is valid to call that function on m, where m is in M, and call that result on l, where l is in L.

h is injective: Another way we can show injectivity is by showing that if f ≠ g, then h(f) ≠ h(g). This is equivalent to what was shown before. Remember, f and g: M -> {L -> K}. So, if f ≠ g, then there is some m in M such that f(m) ≠ g(m). This implies that there is some l in L such that f(m)(l) ≠ g(m)(l). We know that f(m)(l) = h(f)(m, l), and that g(m)(l) = h(g)(m, l), by the definition of h. Therefore, h(f)(m, l) ≠ h(g)(m, l), meaning h(f) ≠ h(g). This demonstrates that h is injective: each different curried function is paired up with a different non-curried function.

h is surjective: For h to be surjective, there has to be a curried function corresponding to every uncurried function. Let's call the curried function f and the uncurried function u. If we have an uncurried function, we can easily pull the curried one out with the currying operation c:

c(u)(m)(l) = u(m, l)

So, for every uncurried function u, the corresponding curried function is c(u). If there is always an f such that h(f) = u for any u: (M × L) -> K, then h is surjective. Let's show that c(u) is that f. h(c(u))(m, l) = c(u)(m)(l) = u(m, l), so h(c(u)) = u. Since c(u) is defined for any u of the appropriate function type, h is surjective.

In case you lost track, we just proved that currying makes sense.

Exercise to the reader: Prove that the function h defined above is the only bijection between {M -> {L -> K}} and {(M × L) -> K} which works for any set M, L and K with no additional knowledge. Hint: I don't know how to do this, but I'm pretty sure it's true.

Cardinal arithmetic

When you get into all this complex mathematics, you can do something really cool: arithmetic. Let's redefine numbers completely, ignoring what we know already. A number, like 1, 2, 3, etc, is the set of all sets of that size. (Actually, scratch that—we can't really do that. Let's stipulate that these sets are subsets of some unspecified, but really big, U, and then keep going as if nothing ever happened. This helps us avert a gigantic contradiction.) More specifically, 1 is the set of sets which are equinumerous (remember, bijection) to the set containing just 1; 2 is the set of sets which are equinumerous to the set containing 1 and 2; etc. We use the notation |A| to refer to the cardinality, or this weird number system's value for the size, of A. So, if A contains the numbers 42, π, 1984 and nothing else, |A| = 3 since there is a bijection between A and the set containing 1, 2, 3, as well as a whole bunch of other sets: they all have the same number of elements.

It's easy to build up some basic operations for these cardinal numbers. If sets A and B are disjoint, |A| + |B| = |the set of anything in either A or B, or both|. For any set A and B, |A|⋅|B| = |A × B|. For any set A and B, |A|^|B| = |{B -> A}|. It's useful to imagine these with some small finite sets to verify that they make sense.

Now, we can show a whole bunch of basic identities we already knew about positive integers make sense here over all cardinal numbers. Here's one: for any cardinal numbers κ, λ, and μ: (κ^λ)^μ = κ^λ⋅μ. So, what's the equivalent statement when translated into a fact about sets?

How does this look in programming?

Going back to the practical(ish) side, currying and uncurrying actually comes in handy when programming Haskell. Here's an exerpt from the Haskell Prelude:


-- curry converts an uncurried function to a curried function;
-- uncurry converts a curried function to a function on pairs.

curry            :: ((a, b) -> c) -> a -> b -> c
curry f x y      =  f (x, y)

uncurry          :: (a -> b -> c) -> ((a, b) -> c)
uncurry f p      =  f (fst p) (snd p)

As an example of the use of uncurry, here's an implementation of zipWith (not from the prelude) which doesn't duplicate the logic of map, assuming an existing implementation of zip:


zipWith :: (a->b->c) -> [a]->[b]->[c]
zipWith f a b = map (uncurry f) $ zip a b

Overview of notation

Capital letters (eg K, L, M) are sets.
Lower case letters are functions (eg f, g, h) or members of sets (k, l, m).
Greek lowercase letters (eg κ, λ, μ) are cardinal numbers.
A × B is the Cartesian product of A and B.
f: A -> B means f is a total function from A to B.
{A -> B} is the set of functions from A to B. (I made this notation up)

Maybe I'll never be the Good Math, Bad Math guy, but I still enjoy the chance to explain stuff. Maybe this will make up for the stupidity of my last post.

Sunday, September 23, 2007

Using bigums efficiently

The other day, Doug Coleman got on the #concatenative IRC channel complaining of a horrible bug: when he put a 100,000 digit prime number in a file, and then tried to load the file, Factor hangs. Doug speculated that this was a compiler bug, but I had another idea: the parser wasn't processing bignums efficiently. First, a little background. This article presumes some basic knowledge of computational complexity and big-O notation, which you should read up on, if you don't know about already.

Bignums and performance

A 'bignum' is Factor's term (grabbed from Lisp terminology) for an arbitrary size integer bigger than the standard integer. Integers which do fit in a word (actually, a word minus 3 bits for the header) are called 'fixnums'. On any platform that Factor currently supports, you can count on the fact that a number smaller than 2²⁹ will be a fixnum, and a number bigger than 2⁶¹-1 will be a bignum.

In most situations, this is an irrelevant implementation detail. In Factor, bignums are used with the same functions as fixnums (and all other builtin numeric types). But there is a subtle performance issue. On fixnums, it takes (basically) constant time--O(1)--to do (basically) any simple math operation. This includes addition, multiplication, division, exponentiation, square roots, etc: all of these operations take basically the same amount of time on any fixnum. We can make this claim because all numbers are fairly small, and there's a short maximum bound on the time they take, even if it varies a little bit. In designing algorithms, programmers take advantage of this frequently.

However, with bignums, math operations take O(n) or more time, where n is the number of digits (bits) in the larger number. If you have two integers of arbitrary length, the only thing you can do to add them is, basically, the addition algorithm you learned in first grade, iterating through the string from least significant to most significant bit. The best possible time for this kind of iteration is proportional to the number of bits--O(n). Multiplication, division and other operations take even more time. For purposes of analysis, let's say multiplication is O(n*log n) where n is the number of digits in the bigger number, and exponentiation is O(d log d), where d is the number of digits in the result. (These are slightly faster than the real times, but give us a good enough estimate while leaving the math mostly simple.)

To be efficient in processing bignums, this additional time for processing must be taken into account. It's very easy to write something which works instantly on fixnums but hangs almost indefinitely on large enough bignums, but there is usually a better way.

string>number

So, I suspected that Doug's code was slow because of a naive implementation of string>number, one which is not optimized for bignums. Looking recursively through the code, I can see that the conversion from numbers takes place in the word digit>integer:


: string>number ( str -- n/f ) 10 base> ;

: base> ( str radix -- n/f )
    {
        { [ 47 pick member? ] [ string>ratio ] }
        { [ 46 pick member? ] [ drop string>float ] }
        { [ t ] [ string>integer ] }
    } cond ;

: string>integer ( str radix -- n/f )
    swap "-" ?head
    >r string>digits 2dup valid-digits?
    [ digits>integer r> [ neg ] when ] [ r> 3drop f ] if ;

: digits>integer ( radix seq -- n )
    0 rot [ swapd * + ] curry reduce ;

Basically, what this does is, for each item in the given sequence, an accumulator (starting at 0) is multiplied by the radix, and then the item is added to the accumulator. An example invocation of digits>integer, which returns the number 1234:


10 { 1 2 3 4 } digits>integer

Let's look at the computational complexity of running digits>integer, in a world where only fixnums exist. In this world, * and + run in constant time. Running digits>integer with a d-digit number will do d additions and d multiplications, for a total of d*(O(1)+O(1)) = O(d) time.

O(d) time is optimal, since the string's length is d in the first place, and we need to iterate through all of its characters.

But, if we assume that all arithmetic takes place in bignums, the calculation gets a little more complicated, and the time a bit worse. All together, O((d(d+1))/2) = O(d²) time is spent in addition, and O(((d log(d))(d log(d)+1))/2) = O((d²log(d)²) time is spent in multiplication. The latter dominates the time, so the total complexity is O((d²log(d)²). This is even worse than quadratic! There must be a better way.

Minimizing intermediate bignums

The problem here is that the numbers that the intermediate bignums are too big. In parsing "1234", the accumulator first contains 0, then 1, then 12, then 123 and finally 1234. So the sum of the intermediate number lengths is d(d+1)/2 = O(d²).

But here's another method: split the string in two equal parts, parse each of them individually, then combine the results together. To combine the results together, the first string has to be shifted left by the length of the second string (using the appropriate radix!). You can write base cases for strings of length 0 and 1, which shouldn't be split. (The value of _ { } digit>integer is 0 and _ { n } digit>integer is n.)

An example: to do 10 { 1 2 3 4 } digit>integer splits into 10 { 1 2 } digits>integer and 10 { 3 4 } digits>integer. By induction, let's assume that those intermediate calculations produce 12 and 34. Now, the value 12 must be multiplied by 10²=100, since { 3 4 } is two digits long. Now, add 1200 and 34, and you get 34!

The analysis for this is almost identical to that of mergesort or quicksort. For a string holding a 8-digit number, there are four main steps: the step processing the individual numbers (really, 8 steps of constant time), then the step combining two numbers of 1 digit each (4 steps of 2x time), then the step combining two of those numbers, 2 digits each (2 steps of 4x time), and the final step of adding the two four-digit numbers together (1 step of 8x time). If you generalize it, there's a total of log₂(d)+1 steps of time O(d), yielding a total of O(d log d).

Actually...

It's a little bit more complicated than that. O(d log d) is something like the complexity for summing a list, resulting in a bignum. But I ignored the other, more expensive operation: the left shift. A base two shift would be O(s+d), where s is the amount shifted over, and d is the number of digits in the thing being shifted. With a base two shift, the complexity O(d log d) would still be valid.

But this shift has an arbitrary radix (usually 10). This is done by calculating the radix raised to the power of the shift, and then multiplying that by the number which needs to be shifted. This takes a bit more time. Counting up the time taken for multiplication and exponentiation in the same manner as addition, we get a total time of O(d*log d*log (d log d)).

Implementation

In our implementation of this function, it'd be the best if we could just go into math.parser, the vocabulary that defines digit>integer, and redefine just that word. This redefinition would be propagated to all the words that use it, all the way up to the Factor parser. Fortunately, Factor explicitly allows this kind of invasion. Just make sure that, after loading the code, everything is recompiled! Otherwise, the change might not be propagated. Here's the code you need:


USING: math kernel math.functions sequences combinators ;
IN: digits

: partition ( seq -- front end )
    [ length 2 /i ] keep cut ;

IN: math.parser

DEFER: digits>integer

: split-parse ( radix seq -- n )
    partition [
        rot [ swap digits>integer ] curry 2apply
    ] 3keep nip
    length ^ swapd * + ;

: digits>integer ( radix seq -- n )
    dup length {
        { 0 [ 2drop 0 ] }
        { 1 [ nip first ] }
        [ drop split-parse ]
    } case ;

Loading this code makes parsing large bignums dramatically faster, though smaller numbers are a little bit slower. The easiest way to load the code is to put it in path extra/digits/digits.factor, and then run the line USE: digits in the listener.

So remember, boys and girls

Whenever doing mathematical calculations that might involve bignums, it's always important to remember the computational complexity of various mathematical operations. If you forget them, a very doable problem can suddenly become intractable.

A technical note about complexity: (for the nit-picky readers among you)

In truth, the best known algorithm for bignum multiplication takes O(n log(n) log(log(n))) time, using fast Fourier transforms, which I don't yet understand. (Well, actually there's one of time O(n log(n) 2^(log*(n))), which is even faster, but no one uses that yet.) Therefore, exponentiation should take O(d log(d) log(log(d))) time, where d is the size of the final result. This is because the algorithm's time is dominated by the final doubling.

I felt that it was appropriate to use O(d log(d)) as an approximation of O(d log(d) log(log(d))), since the double log function grows very slowly, and it clutters up the math with no tangible result. For this analysis, it doesn't hurt anyone to elide it. If I were publishing this in some respectable academic thing (hah! as if that makes sense!), I'd change it at the expense of clarity.