## Sunday, September 23, 2007

### Using bigums efficiently

The other day, Doug Coleman got on the `#concatenative` IRC channel complaining of a horrible bug: when he put a 100,000 digit prime number in a file, and then tried to load the file, Factor hangs. Doug speculated that this was a compiler bug, but I had another idea: the parser wasn't processing bignums efficiently. First, a little background. This article presumes some basic knowledge of computational complexity and big-O notation, which you should read up on, if you don't know about already.

Bignums and performance

A 'bignum' is Factor's term (grabbed from Lisp terminology) for an arbitrary size integer bigger than the standard integer. Integers which do fit in a word (actually, a word minus 3 bits for the header) are called 'fixnums'. On any platform that Factor currently supports, you can count on the fact that a number smaller than 229 will be a fixnum, and a number bigger than 261-1 will be a bignum.

In most situations, this is an irrelevant implementation detail. In Factor, bignums are used with the same functions as fixnums (and all other builtin numeric types). But there is a subtle performance issue. On fixnums, it takes (basically) constant time--O(1)--to do (basically) any simple math operation. This includes addition, multiplication, division, exponentiation, square roots, etc: all of these operations take basically the same amount of time on any fixnum. We can make this claim because all numbers are fairly small, and there's a short maximum bound on the time they take, even if it varies a little bit. In designing algorithms, programmers take advantage of this frequently.

However, with bignums, math operations take O(n) or more time, where n is the number of digits (bits) in the larger number. If you have two integers of arbitrary length, the only thing you can do to add them is, basically, the addition algorithm you learned in first grade, iterating through the string from least significant to most significant bit. The best possible time for this kind of iteration is proportional to the number of bits--O(n). Multiplication, division and other operations take even more time. For purposes of analysis, let's say multiplication is O(n*log n) where n is the number of digits in the bigger number, and exponentiation is O(d log d), where d is the number of digits in the result. (These are slightly faster than the real times, but give us a good enough estimate while leaving the math mostly simple.)

To be efficient in processing bignums, this additional time for processing must be taken into account. It's very easy to write something which works instantly on fixnums but hangs almost indefinitely on large enough bignums, but there is usually a better way.

`string>number`

So, I suspected that Doug's code was slow because of a naive implementation of `string>number`, one which is not optimized for bignums. Looking recursively through the code, I can see that the conversion from numbers takes place in the word `digit>integer`:
`: string>number ( str -- n/f ) 10 base> ;: base> ( str radix -- n/f )    {        { [ 47 pick member? ] [ string>ratio ] }        { [ 46 pick member? ] [ drop string>float ] }        { [ t ] [ string>integer ] }    } cond ;: string>integer ( str radix -- n/f )    swap "-" ?head    >r string>digits 2dup valid-digits?    [ digits>integer r> [ neg ] when ] [ r> 3drop f ] if ;: digits>integer ( radix seq -- n )    0 rot [ swapd * + ] curry reduce ;`

Basically, what this does is, for each item in the given sequence, an accumulator (starting at 0) is multiplied by the radix, and then the item is added to the accumulator. An example invocation of `digits>integer`, which returns the number 1234:
`10 { 1 2 3 4 } digits>integer`

Let's look at the computational complexity of running digits>integer, in a world where only fixnums exist. In this world, `*` and `+` run in constant time. Running digits>integer with a d-digit number will do d additions and d multiplications, for a total of d*(O(1)+O(1)) = O(d) time.

O(d) time is optimal, since the string's length is d in the first place, and we need to iterate through all of its characters.

But, if we assume that all arithmetic takes place in bignums, the calculation gets a little more complicated, and the time a bit worse. All together, O((d(d+1))/2) = O(d2) time is spent in addition, and O(((d log(d))(d log(d)+1))/2) = O((d2log(d)2) time is spent in multiplication. The latter dominates the time, so the total complexity is O((d2log(d)2). This is even worse than quadratic! There must be a better way.

Minimizing intermediate bignums

The problem here is that the numbers that the intermediate bignums are too big. In parsing "1234", the accumulator first contains 0, then 1, then 12, then 123 and finally 1234. So the sum of the intermediate number lengths is d(d+1)/2 = O(d2).

But here's another method: split the string in two equal parts, parse each of them individually, then combine the results together. To combine the results together, the first string has to be shifted left by the length of the second string (using the appropriate radix!). You can write base cases for strings of length 0 and 1, which shouldn't be split. (The value of `_ { } digit>integer` is 0 and `_ { n } digit>integer` is n.)

An example: to do `10 { 1 2 3 4 } digit>integer` splits into `10 { 1 2 } digits>integer` and `10 { 3 4 } digits>integer`. By induction, let's assume that those intermediate calculations produce 12 and 34. Now, the value 12 must be multiplied by 102=100, since `{ 3 4 }` is two digits long. Now, add 1200 and 34, and you get 34!

The analysis for this is almost identical to that of mergesort or quicksort. For a string holding a 8-digit number, there are four main steps: the step processing the individual numbers (really, 8 steps of constant time), then the step combining two numbers of 1 digit each (4 steps of 2x time), then the step combining two of those numbers, 2 digits each (2 steps of 4x time), and the final step of adding the two four-digit numbers together (1 step of 8x time). If you generalize it, there's a total of log2(d)+1 steps of time O(d), yielding a total of O(d log d).

Actually...

It's a little bit more complicated than that. O(d log d) is something like the complexity for summing a list, resulting in a bignum. But I ignored the other, more expensive operation: the left shift. A base two shift would be O(s+d), where s is the amount shifted over, and d is the number of digits in the thing being shifted. With a base two shift, the complexity O(d log d) would still be valid.

But this shift has an arbitrary radix (usually 10). This is done by calculating the radix raised to the power of the shift, and then multiplying that by the number which needs to be shifted. This takes a bit more time. Counting up the time taken for multiplication and exponentiation in the same manner as addition, we get a total time of O(d*log d*log (d log d)).

Implementation

In our implementation of this function, it'd be the best if we could just go into `math.parser`, the vocabulary that defines `digit>integer`, and redefine just that word. This redefinition would be propagated to all the words that use it, all the way up to the Factor parser. Fortunately, Factor explicitly allows this kind of invasion. Just make sure that, after loading the code, everything is recompiled! Otherwise, the change might not be propagated. Here's the code you need:
`USING: math kernel math.functions sequences combinators ;IN: digits: partition ( seq -- front end )    [ length 2 /i ] keep cut ;IN: math.parserDEFER: digits>integer: split-parse ( radix seq -- n )    partition [        rot [ swap digits>integer ] curry 2apply    ] 3keep nip    length ^ swapd * + ;: digits>integer ( radix seq -- n )    dup length {        { 0 [ 2drop 0 ] }        { 1 [ nip first ] }        [ drop split-parse ]    } case ;`

Loading this code makes parsing large bignums dramatically faster, though smaller numbers are a little bit slower. The easiest way to load the code is to put it in path `extra/digits/digits.factor`, and then run the line `USE: digits` in the listener.

So remember, boys and girls

Whenever doing mathematical calculations that might involve bignums, it's always important to remember the computational complexity of various mathematical operations. If you forget them, a very doable problem can suddenly become intractable.

In truth, the best known algorithm for bignum multiplication takes O(n log(n) log(log(n))) time, using fast Fourier transforms, which I don't yet understand. (Well, actually there's one of time O(n log(n) 2^(log*(n))), which is even faster, but no one uses that yet.) Therefore, exponentiation should take O(d log(d) log(log(d))) time, where d is the size of the final result. This is because the algorithm's time is dominated by the final doubling.

I felt that it was appropriate to use O(d log(d)) as an approximation of O(d log(d) log(log(d))), since the double log function grows very slowly, and it clutters up the math with no tangible result. For this analysis, it doesn't hurt anyone to elide it. If I were publishing this in some respectable academic thing (hah! as if that makes sense!), I'd change it at the expense of clarity.