I am a writing a lexer as part of a compiler project and I need to detect if an integer is larger than what can fit in a int so I can print an error. Is there a C++ standard library for big integers that could fit this purpose?
You might want to check out GMP if you want to be able to deal with such numbers.
You can check to see if the number is higher or lower than INT_MAX or INT_MIN respectively. You would need to #include <limits.h>
You could probably use libgmp. However, I think for your purpose, it's just unnecessary.
If you, for example, parse your numbers to 32-bit unsigned int, you
- parse the first at most 9 decimal numbers (that's floor(32*log(2)/log(10)). If you haven't more, the number is OK.
- take the next digit. If the number you got / 10 is not equal to the number from the previous step, the number is bad.
- if you have more digits (eg. more than 9+1), the number is bad.
- else the number is good.
Be sure to skip any leading zeros etc.
In your lexer as you parse the integer string you must multiply by 10 before you add each new digit (assuming you're parsing from left to right). If that running total suddenly becomes negative, you've exceeded the precision of the integer.
libgmp is a general solution, though maybe a bit heavyweight.
For a lighter-weight lexical analyzer, you could treat it as a string; trim leading zeros, then if it's longer than 10 digits, it's too long; if shorter then it's OK, if exactly 10 digits, string compare to the max values 2^31=2147483648 or 2^32=4294967296. Keep in mind that -2^31 is a legal value but 2^31 isn't. Also keep in mind the syntax for octal and hexadecimal constants.
How about this. Use atol, and check for overflow and underflow.
#include <iostream>
#include <string>
using namespace std;
main()
{
string str;
cin >> str;
int i = atol(str.c_str());
if (i == INT_MIN && str != "-2147483648") {
cout << "Underflow" << endl;
} else if (i == INT_MAX && str != "2147483647") {
cout << "Overflow" << endl;
} else {
cout << "In range" << endl;
}
}
To everyone suggesting atoi:
- My atoi() implementation does not set errno.
- My atoi() implementation does not return INT_MIN or INT_MAX on overflow.
- We cannot rely on sign reversal. Consider 0x4000...0.
- *2 and the negative bit is set.
- *4 and the value is zero.
- With base-10 numbers our next digit would multiply this by 10.
This is all nuts. Unless your lexer is parsing gigs of numerical data, stop the premature optimization already. It only leads to grief.
This approach may be inefficient, but it's adequate for your needs:
const char * p = "1234567890123";
int i = atoi( p );
ostringstream o;
o << i;
return o.str() == p;
Or, leveraging the stack:
const char * p = "1234567890123";
int i = atoi( p );
char buffer [ 12 ];
snprintf( buffer, 12, "%d", i );
return strcmp(buffer,p) == 0;
If your language (like C) supports compile-time evaluation of expressions, then you might need to think about that, too.
Stuff like this:
#define N 2147483643 // This is 2^31-5, i.e. close to the limit.
int toobig = N + N;
GCC will catch this, saying "warning: integer overflow in expression", but of course no individual literal is overflowing. This might be more than you require, just thought I'd point it out as stuff that real compilers to in this department.