views:

720

answers:

5

While doing some random experimentation with a factorial program in C, Python and Scheme. I came across this fact:

In C, using 'unsigned long long' data type, the largest factorial I can print is of 65. which is '9223372036854775808' that is 19 digits as specified here.

In Python, I can find the factorial of a number as large as 999 which consists of a large number of digits, much more than 19.

How does CPython achieve this? Does it use a data type like 'octaword' ?

I might be missing some fundamental facts here. So, I would appreciate some insights and/or references to read. Thanks!

UPDATE: Thank you all for the explanation. Does that means, CPython is using the GNU Multi-precision library (or some other similar library)?

UPDATE 2: I am looking for Python's 'bignum' implementation in the sources. Where exactly it is? Its here at http://svn.python.org/view/python/trunk/Objects/longobject.c?view=markup. Thanks Baishampayan.

+1  A: 

Python assigns to long integers (all ints in Python 3) just as much space as they need -- an array of "digits" (base being a power of 2) allocated as needed.

Alex Martelli
+3  A: 

Not octaword. It implemented bignum structure to store arbitary-precision numbers.

kcwu
+8  A: 

It's called Arbitrary Precision Arithmetic. There's more here: http://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic

sykora
Just to be clear, it is perfectly possible to do arbitrary precision arithmetic in pure C using libraries like GMP (http://gmplib.org/). It's fairly simple to write a factorial program using it.
Matthew Flaschen
When did I say it wasn't possible?
sykora
I didn't mean to imply you had said that.
Matthew Flaschen
And you actually didn't either! ;-)
none
+5  A: 

Data types such as int in C are directly mapped (more or less) to the data types supported by the processor. So the limits on C's int are essentially the limits imposed by the processor hardware.

But one can implement one's own int data type entirely in software. You can for example use an array of digits as your underlying representation. May be like this:

class MyInt {
    private int [] digits;
    public MyInt(int noOfDigits) {
       digits = new int[noOfDigits];
    }
}

Once you do that you may use this class and store integers containing as many digits as you want, as long as you don't run out memory.

Perhaps Python is doing something like this inside its virtual machine. You may want to read this article on Arbitrary Precision Arithmetic to get the details.

Frederick
Frederick's right, but this code should be seen as a proof of concept, not a recommended design. Real arbitrary precision libraries like GMP (http://gmplib.org/) or BigInteger operate more efficiently, and don't rely unnecessarily on base 10.
Matthew Flaschen
As an aside: on processors which have native support for BCD arithmetic, it is (or used to be) common to do bignums in base 10 to simplify the human interaction part of the code. 'Course, this requires assembly or a compiler that supports BCD types...
dmckee
+8  A: 

Looking at the Python source code, it seems the long type (at least in pre-Python 3 code) is defined in longintrepr.h like this -

/* Long integer representation.
   The absolute value of a number is equal to
    SUM(for i=0 through abs(ob_size)-1) ob_digit[i] * 2**(SHIFT*i)
   Negative numbers are represented with ob_size < 0;
   zero is represented by ob_size == 0.
   In a normalized number, ob_digit[abs(ob_size)-1] (the most significant
   digit) is never zero.  Also, in all cases, for all valid i,
    0 <= ob_digit[i] <= MASK.
   The allocation function takes care of allocating extra memory
   so that ob_digit[0] ... ob_digit[abs(ob_size)-1] are actually available.

   CAUTION:  Generic code manipulating subtypes of PyVarObject has to
   aware that longs abuse  ob_size's sign bit.
*/

struct _longobject {
    PyObject_VAR_HEAD
    digit ob_digit[1];
};

The actual usable interface of the long type is then defined in longobject.h by creating a new type PyLongObject like this -

typedef struct _longobject PyLongObject;

And so on.

There is more stuff happening inside longobject.c, you can take a look at those for more details.

Baishampayan Ghose
Thanks a lot! longobject.c it is :) I just looked in Python/ sub-directory of the sources, missing the fact that its all Objects in Python !
Amit