tags:

views:

622

answers:

9
+2  Q: 

C++: Big Integers

I am a writing a lexer as part of a compiler project and I need to detect if an integer is larger than what can fit in a int so I can print an error. Is there a C++ standard library for big integers that could fit this purpose?

A: 

You might want to check out GMP if you want to be able to deal with such numbers.

turbovince
A: 

You can check to see if the number is higher or lower than INT_MAX or INT_MIN respectively. You would need to #include <limits.h>

Suroot
i think the problem proposed in the question is what datatype do you store the value in in order to do the comparison you suggest...
Evan Teran
-1. As Evan suggests, any value that you read into an int is <= INT_MAX and >= INT_MIN by definition. But choosing some larger datatype (e.g. long long) is a hack, since that solution still fails if the number read in exceeds the range of *that* datatype.
j_random_hacker
Double tends to be quite sufficient for that. Perhaps not in the theoretical sense, but I don't forsee 64-byte integers any time soon.
MSalters
+9  A: 

The Standard C library functions for converting number strings to integers are supposed to detect numbers which are out of range, and set errno to ERANGE to indicate the problem. See here

Sol
+5  A: 

You could probably use libgmp. However, I think for your purpose, it's just unnecessary.

If you, for example, parse your numbers to 32-bit unsigned int, you

  1. parse the first at most 9 decimal numbers (that's floor(32*log(2)/log(10)). If you haven't more, the number is OK.
  2. take the next digit. If the number you got / 10 is not equal to the number from the previous step, the number is bad.
  3. if you have more digits (eg. more than 9+1), the number is bad.
  4. else the number is good.

Be sure to skip any leading zeros etc.

jpalecek
For (1), I think you mean "If you have less than 9" isntead of "If you haven't more".
strager
+1 you don't need a bigint library for this. It just takes some string parsing.
cletus
stranger: No I mean if you haven't more than 9, or equivalently if you hav less than or equal 9. 9 is still OK in this case
jpalecek
A: 

In your lexer as you parse the integer string you must multiply by 10 before you add each new digit (assuming you're parsing from left to right). If that running total suddenly becomes negative, you've exceeded the precision of the integer.

it is slightly more complicated than checking for sign flip. :-P.
Evan Teran
Consider 0x4000...0 times 10.
Mr.Ree
+3  A: 

libgmp is a general solution, though maybe a bit heavyweight.

For a lighter-weight lexical analyzer, you could treat it as a string; trim leading zeros, then if it's longer than 10 digits, it's too long; if shorter then it's OK, if exactly 10 digits, string compare to the max values 2^31=2147483648 or 2^32=4294967296. Keep in mind that -2^31 is a legal value but 2^31 isn't. Also keep in mind the syntax for octal and hexadecimal constants.

Liudvikas Bukys
+1  A: 

How about this. Use atol, and check for overflow and underflow.

#include <iostream>
#include <string>

using namespace std;

main()
{
    string str;
    cin >> str;
    int i = atol(str.c_str());
    if (i == INT_MIN && str != "-2147483648") {
        cout << "Underflow" << endl;
    } else if (i == INT_MAX && str != "2147483647") {
        cout << "Overflow" << endl;
    } else {
        cout << "In range" << endl;
    }   
}
DasBoot
INT_MIN == -2147483648, not -2147483647
Evan Teran
@Evan. Thanks Fixed.
DasBoot
Does atol() return INT_MIN (or INT_MAX) for values too small (or too large) to be represented? I'm guessing that it doesn't, since the man page doesn't explicitly guarantee that it does, and doing so would complicate the implementation, as well as slow it down for typical cases.
j_random_hacker
@j_random_hacker. It works on G++ for me. Also, this suggests that it does too: http://www.cplusplus.com/reference/clibrary/cstdlib/atol.html
DasBoot
"If the correct value is out of the range of representable values, LONG_MAX or LONG_MIN is returned."
DasBoot
@DasBoot: Thanks for the info and link. I can verify that it atoi() behaves as you say on MSVC++8 on Windows (despite a rather confused explanation on the MSDN reference page) and on g++ 4.1.2 on Linux. Though it doesn't work on MinGW (g++ 3.4.5 on Windows).
j_random_hacker
@j_random_hacker: Interesting. Thanks for the info. I can verify that it works with g++ 4.0.1 on Mac as well.
DasBoot
+1  A: 

To everyone suggesting atoi:

  • My atoi() implementation does not set errno.
  • My atoi() implementation does not return INT_MIN or INT_MAX on overflow.
  • We cannot rely on sign reversal. Consider 0x4000...0.
    • *2 and the negative bit is set.
    • *4 and the value is zero.
    • With base-10 numbers our next digit would multiply this by 10.

This is all nuts. Unless your lexer is parsing gigs of numerical data, stop the premature optimization already. It only leads to grief.

This approach may be inefficient, but it's adequate for your needs:

const char * p = "1234567890123";
int i = atoi( p );

ostringstream o;
o << i;
return o.str() == p;

Or, leveraging the stack:

const char * p = "1234567890123";
int i = atoi( p );

char buffer [ 12 ];
snprintf( buffer, 12, "%d", i );
return strcmp(buffer,p) == 0;
Mr.Ree
I guess I fail to see how using the standard-defined way of detecting this problem is somehow "premature optimization". But if your atoi is broken (which atoi is it?), your suggestion seems a reasonable way of working around it.
Sol
A: 

If your language (like C) supports compile-time evaluation of expressions, then you might need to think about that, too.

Stuff like this:

#define N 2147483643  // This is 2^31-5, i.e. close to the limit.

int toobig = N + N;

GCC will catch this, saying "warning: integer overflow in expression", but of course no individual literal is overflowing. This might be more than you require, just thought I'd point it out as stuff that real compilers to in this department.

unwind