views:

369

answers:

7

I'm writing a utility to calculate π to a million digits after the decimal. On a 32- or 64-bit consumer desktop system, what is the most efficient way to store and work with such a large number accurate to the millionth digit?

clarification: The language would be C.

+3  A: 

I'd recommend storing it as an array of short ints, one per digit, and then carefully write utility classes to add and subtract portions of the number. You'll end up moving from this array of ints to floats and back, but you need a 'perfect' way of storing the number - so use its exact representation. This isn't the most efficient way in terms of space, but a million ints isn't very big.

It's all in the way you use the representation. Decide how you're going to 'work with' this number, and write some good utility functions.

Peter
hi, thanks for the reply. with the algorithms already known and out there, I can quickly generate more digits than I know how to handle. The problem would come with doing any sort of floating point math, I'm not sure how to tell whether the result is accurate or if it's being affected by the nuances of, say, an IEEE 754 float. I could calculate it one digit at a time, and storing them all as a huge C array of shorts... but that seems to be brute force
pxl
+8  A: 

Forget floating point, you need bit strings that represent integers

This takes a bit less than 1/2 megabyte per number. "Efficient" can mean a number of things. Space-efficient? Time-efficient? Easy-to-program with?

Your question is tagged floating-point, but I'm quite sure you do not want floating point at all. The entire idea of floating point is that our data is only known to a few significant figures and even the famous constants of physics and chemistry are known precisely to only a handful or two of digits. So there it makes sense to keep a reasonable number of digits and then simply record the exponent.

But your task is quite different. You must account for every single bit. Given that, no floating point or decimal arithmetic package is going to work unless it's a template you can arbitrarily size, and then the exponent will be useless. So you may as well use integers.

What you really really need is a string of bits. This is simply an array of convenient types. I suggest <stdint.h> and simply using uint32_t[125000] (or 64) to get started. This actually might be a great use of the more obscure constants from that header that pick out bit sizes that are fast on a given platform.

To be more specific we would need to know more about your goals. Is this for practice in a specific language? For some investigation into number theory? If the latter, why not just use a language that already supports Bignum's, like Ruby?

Then the storage is someone else's problem. But, if what you really want to do is implement a big number package, then I might suggest using bcd (4-bit) strings or even ordinary ascii 8-bit strings with printable digits, simply because things will be easier to write and debug and maximum space and time efficiency may not matter so much.

DigitalRoss
great points. this is an investigation into cs theory. I'm using C as the language as it's what i'm most familiar with. I'm using the PI algorithm as algorithms for it are well known as well as the exact digits up to tens of millions of digits, so I can test my algorithm out to see if it's working. Efficiency would, as is always, a tradeoff between memory and time. thanks!
pxl
Thank you for giving me a reason to actually *like* Ruby.
MusiGenesis
A: 

i once worked on an application that used really large numbers (but didnt need good precision). What we did was store the numbers as logarithms since you can store a pretty big number as a log10 within an int.

Think along this lines before resorting to bit stuffing or some complex bit representations.

I am not too good with complex math, but i reckon there are solutions which are elegant when storing numbers with millions of bits of precision.

Andrew Keith
numbers as logs, interesting thought. but i guess that's where only requiring good precision would work. log10 of a huge number, and converting back won't always equal the same number.
pxl
+1  A: 

Unless you're writing this purely for fun and/or learning, I'd recommend using a library such as GNU Multiprecision. Look into the mpf_t data type and its associated functions for storing arbitrary-precision floating-point numbers.

If you are just doing this for fun/learning, then represent numbers as an array of chars, which each array element storing one decimal digit. You'll have to implement long addition, long multiplication, etc.

Adam Rosenfield
wouldn't that incur a lot of overhead converting a char to a digit and back for each element?
pxl
Not if you treat them as raw bytes instead of ASCII characters. If you used `'0'` through `'9'`, you'd have a little extra overhead, but not if you just used `(char)0` through `(char)9`.
Adam Rosenfield
A: 

Try PARI/GP, see wikipedia.

Ewan Todd
hmmm.. interesting
pxl
+2  A: 

If you're willing to tolerate computing pi in hex instead of decimal, there's a very cute algorithm that allows you to compute a given hexadecimal digit without knowing the previous digits. This means, by extension, that you don't need to store (or be able to do computation with) million digit numbers.

Of course, if you want to get the nth decimal digit, you will need to know all of the hex digits up to that precision in order to do the base conversion, so depending on your needs, this may not save you much (if anything) in the end.

Stephen Canon
using hex instead of decimal is a neat idea, and would allow you to store more digits, in a way, per digit. and would take up less room, work nicely with binary math, etc... but having to calculate out each digit successively and not storing the result would be a killer. neat algorithm. thanks
pxl
+1  A: 

You could store its decimals digits as text in a file and mmap it to an array.

lhf
interesting idea
pxl
The data can be found at http://www.eveandersson.com/pi/digits/1000000
lhf