views:

1373

answers:

8

I'm working on a programming language, and today I got the point where I could compile the factorial function(recursive), however due to the maximum size of an integer the largest I can get is factorial(12). What are some techniques for handling integers of an arbitrary maximum size. The language currently works by translating code to C++.

+8  A: 

If you need larger than 32-bits you could consider using 64-bit integers (long long), or use or write an arbitrary precision math library, e.g. GNU MP.

Barry Kelly
+4  A: 

It's been a while since I've used C++, but since no-one has answered I figured I'd point you on the right track - what you are looking for is often referred to as "Bignum", basically some kind of class that handles arbitrarily large integer numbers. Here is one such library:

http://ttmath.slimaczek.pl/ttmath

Edit: Looking at the page for GMP, that looks like a really solid approach.

Kendall Helmstetter Gelner
+1  A: 

There's no easy way to do it in C++. You'll have to use an external library such as GNU Multiprecision, or use a different language which natively supports arbitrarily large integers such as Python.

Adam Rosenfield
I think it's fairly easy. GMP comes along with a nice C++ header
sellibitze
A: 

Other posters have given links to libraries that will do this for you, but it seem like you're trying to build this into your language. My first thought is: are you sure you need to do that? Most languages would use an add-on library as others have suggested.

Assuming you're writing a compiler and you do need this feature, you could implement integer arithmetic functions for arbitrarily large values in assembly.

For example, a simple (but non-optimal) implementation would represent the numbers as Binary Coded Decimal. The arithmetic functions could use the same algorithms as you'd use if you were doing the math with pencil and paper.

Also, consider using a specialized data type for these large integers. That way "normal" integers can use the standard 32 bit arithmetic.

Clayton
What's people's obsession with BCD? Nodoby asked for it here.
sellibitze
A: 

My prefered approach would be to use my current int type for 32-bit ints(or maybe change it to internally to be a long long or some such, so long as it can continue to use the same algorithms), then when it overflows, have it change to storing as a bignum, whether of my own creation, or using an external library. However, I feel like I'd need to be checking for overflow on every single arithmetic operation, roughly 2x overhead on arithmetic ops. How could I solve that?

Alex Gaynor
Don't worry too much about performance. Code it without any regard to performance, then go in and refactor stuff if you can't meet some benchmark.
Bill K
Yeah, you can even refactor a bubblesort to a mergesort... And you certainly want a good marketing image compared to others to sell your shrink wrapped boxes of your general purpose OO language to big boys. What? is it not general purpose?
artificialidiot
The problems you describe are why I suggested creating a new data type. C++, Java, etc. would not automatically convert a 16-bit int to 32 bits if a multiplication overflowed, so why should you. On the other hand, if this is a documented requirement, you'll just have to accept the performance hit.
Clayton
+1  A: 

If you're building this into a language (for learning purposes I'd guess), I think I would probably write a little BCD library. Just store your BCD numbers inside byte arrays.

In fact, with today's gigantic storage abilities, you might just use a byte array where each byte just holds a digit (0-9). You then write your own routine to add, subtract multiply and divide your byte arrays.

(Divide is the hard one, but I bet you can find some code out there somewhere.)

I can give you some Java-like psuedocode but can't really do C++ from scratch at this point:

class BigAssNumber {
    private byte[] value;

    // This constructor can handle numbers where overflows have occurred.
    public BigAssNumber(byte[] value) {
        this.value=normalize(value);
    }

    // Adds two numbers and returns the sum.  Originals not changed.
    public BigAssNumber add(BigAssNumber other) {
        // This needs to be a byte by byte copy in newly allocated space, not pointer copy!
        byte[] dest = value.length > other.length ? value : other.value;         

        // Just add each pair of numbers, like in a pencil and paper addition problem.
        for(int i=0; i<min(value.length, other.value.length); i++)
            dest[i]=value[i]+other.value[i];

        // constructor will fix overflows.
        return new BigAssNumber(dest);
    }

    // Fix things that might have overflowed  0,17,22 will turn into 1,9,2        
    private byte[] normalize(byte [] value) {
        if (most significant digit of value is not zero)
            extend the byte array by a few zero bytes in the front (MSB) position.

        // Simple cheap adjust.  Could lose inner loop easily if It mattered.
        for(int i=0;i<value.length;i++)
            while(value[i] > 9) {
                value[i] -=10;
                value[i+1] +=1;
            }
        }
    }
}

I use the fact that we have a lot of extra room in a byte to help deal with addition overflows in a generic way. Can work for subtraction too, and help on some multiplies.

Bill K
And if you're not doing it for school, just go grab some BigInteger code somewhere and use that.
Bill K
+1  A: 

If you want to roll your own arbitrary precision library, see Knuth's Seminumerical Algorithms, volume 2 of his magnum opus.

John D. Cook
+1 for Knuth -- otherwise it's possible you'll miss out the rare corner case for division.
Steve Gilham
A: 

If I were implement my own language and want to support arbitrary length numbers, I will use a target language with the carry/borrow concept. But since there is no HLL that implements this without severe performance implications (like exceptions), I will certainly go implement it in assembly. It will probably take a single instruction (as in JC in x86) to check for overflow and handle it (as in ADC in x86), which is an acceptable compromise for a language implementing arbitrary precision. Then I will use a few functions written in assembly instead of regular operators, if you can utilize overloading for a more elegant output, even better. But I don't expect generated C++ to be maintainable (or meant to be maintained) as a target language.

Or, just use a library which has more bells and whistles than you need and use it for all your numbers.

As a hybrid approach, detect overflow in assembly and call the library function if overflow instead of rolling your own mini library.

artificialidiot