views:

550

answers:

9

At first glance, this question may seem like a duplicate of http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflow-in-c-c, however it is actually significantly different.

I've found that while detecting an unsigned integer overflow is pretty trivial, detecting a signed overflow in C/C++ is actually more difficult than most people think.

The most obvious, yet naive, way to do it would be something like:

int add(int lhs, int rhs)
{
 int sum = lhs + rhs;
 if ((lhs >= 0 && sum < rhs) || (lhs < 0 && sum > rhs)) {
  /* an overflow has occurred */
  abort();
 }
 return sum; 
}

The problem with this is that according to the C standard, signed integer overflow is undefined behavior. In other words, according to the standard, as soon as you even cause a signed overflow, your program is just as invalid as if you dereferenced a null pointer. So you can't cause undefined behavior, and then try to detect the overflow after the fact, as in the above post-condition check example.

Even though the above check is likely to work on many compilers, you can't count on it. In fact, because the C standard says signed integer overflow is undefined, some compilers (like GCC) will optimize away the above check when optimization flags are set, because the compiler assumes a signed overflow is impossible. This totally breaks the attempt to check for overflow.

So, another possible way to check for overflow would be:

int add(int lhs, int rhs)
{
 if (lhs >= 0 && rhs >= 0) {
  if (INT_MAX - lhs <= rhs) {
   /* overflow has occurred */
   abort();
  }
 }
 else if (lhs < 0 && rhs < 0) {
  if (lhs <= INT_MIN - rhs) {
   /* overflow has occurred */
   abort();
  }
 }

 return lhs + rhs;
}

This seems more promising, since we don't actually add the two integers together until we make sure in advance that performing such an add will not result in overflow. Thus, we don't cause any undefined behavior.

However, this solution is unfortunately a lot less efficient than the initial solution, since you have to perform a subtract operation just to test if your addition operation will work. And even if you don't care about this (small) performance hit, I'm still not entirely convinced this solution is adequate. The expression lhs <= INT_MIN - rhs seems exactly like the sort of expression the compiler might optimize away, thinking that signed overflow is impossible.

So is there a better solution here? Something that is guaranteed to 1) not cause undefined behavior, and 2) not provide the compiler with an opportunity to optimize away overflow checks? I was thinking there might be some way to do it by casting both operands to unsigned, and performing checks by rolling your own two's-complement arithmetic, but I'm not really sure how to do that.

A: 

You may have better luck converting to 64-bit integers and testing similar conditions like that. For example:

#include <stdint.h>

...

int64_t sum = (int64_t)lhs + (int64_t)rhs;
if (sum < INT_MIN || sum > INT_MAX) {
    // Overflow occurred!
}
else {
    return sum;
}

You may want to take a closer look at how sign extension will work here, but I think it is correct.

Jonathan
Remove the bitwise-and and cast from the return statement. They're incorrect as written. Conversion from larger signed integer types to smaller ones is perfectly well-defined as long as the value fits in the smaller type, and it does not need an explicit cast. Any compiler that gives a warning and suggests that you add a cast when you're just checked that the value does not overflow is a broken compiler.
R..
@R You are correct, I just like being explicit about my casts. I'll change it for correctness, though. For future readers, the return line read `return (int32_t)(sum `.
Jonathan
R..
+11  A: 

If you use inline assembler you can check the overflow flag. Another possibility is taht you can use a safeint datatype. I recommend that read this paper on Integer Security.

Rook
+1 This is another way of saying "If C won't define it, then you're forced into platform-specific behavior." So many things that are easily taken care of in assembly are undefined in C, creating mountains out of molehills in the name of portability.
Mike DeSimone
@Mike DeSimone eloquent.
Rook
Signed overflow if perfectly detectable and avoidable in plain C. -1 for suggesting asm instead of suggesting C code that will generate the correct asm on a good compiler.
R..
@R.. but if it is a valid solution then why would down vote?
Rook
@R I have to agree with Rook.
wheaties
Thanks for the paper :)
Matthieu M.
@R. perfectly detectable yes, however for integers operations one needs to care about performance. Many libraries are quite intensive in their integer usage (compression / decompression, encoding / decoding), it would be great to be able to use safe integer there too.
Matthieu M.
I gave a downvote for an asm answer to a C question. As I have said, there are correct, portable ways to write the check in C which will generate the exact same asm that you would write by hand. Naturally if you use these, the performance impact will be the same, and it will be much less of an impact than the C++ safeint stuff you also recommended.
R..
@Matthieu: If you are writing code that will only be used on one implementation, and that implementation guarantees that something will work, and you need good integer performance, you can certainly use implementation-specific tricks. That isn't what the OP was asking for, though.
David Thornley
Agreed with rook. I guess overflow flags are present in many platforms.
Green Code
C distinguishes implementation-defined behavior and undefined behavior for good reasons, and even if something with UB "works" in **the current version** of your implementation, that doesn't mean it will continue to work in future versions. Consider gcc and signed overflow behavior...
R..
Since I based my -1 on a claim that we could get C code to generate the identical asm, I guess it's only fair to retract it when all the major compilers turn out to be junk in this regard..
R..
+11  A: 

IMHO, the eastiest way to deal with overflow sentsitive C++ code is to use SafeInt<T>. This is a cross platform C++ template hosted on code plex which provides the safety guarantees that you desire here.

I find it very intuitive to use as it provides the many of the same usage patterns as normal numerical opertations and expresses over and under flows via exceptions.

JaredPar
+9  A: 

Your approach with subtraction is correct and well-defined. A compiler cannot optimize it away.

Another correct approach, if you have a larger integer type available, is to perform the arithmetic in the larger type and then check that the result fits in the smaller type when converting it back

int sum(int a, int b)
{
    long long c;
    assert(LLONG_MAX>INT_MAX);
    c = (long long)a + b;
    if (c < INT_MIN || c > INT_MAX) abort();
    return c;
}

A good compiler should convert the entire addition and if statement into an int-sized addition and a single conditional jump-on-overflow and never actually perform the larger addition.

Edit: As Stephen pointed out, I'm having trouble getting a (not-so-good) compiler, gcc, to generate the sane asm. The code it generates is not terribly slow, but certainly suboptimal. If anyone knows variants on this code that will get gcc to do the right thing, I'd love to see them.

R..
To anyone wanting to use this, be sure you're looking at my edited version. In the original I stupidly omitted the cast to `long long` before the addition.
R..
Then, by your definition gcc 4.4.4 is not a good compiler. I just tried this at `-O2` and `-O3` and got the same code, 12 lines of assembly (not counting assembler commands) where the actual addition is performed by `leaq (%rax,%rcx), %rcx`, or `addl %eax, %ecx`; `adcl %edx, %ebx` if the `-m32` flag is given.
Mike DeSimone
Also, gcc didn't remove the `if` statement from the generated code.
Mike DeSimone
@R, are you *certain* that the approach with subtraction is correct? Since signed-overflow is undefined, isn't it possible the compiler could see the expression `(INT_MAX - lhs <= rhs)` and optimize it away, thinking that such an expression must *always* be true?
Channel72
Yes I am certain. There is no overflow in the expression `INT_MAX - lhs <= rhs` (assuming `lhs` is positive). Depending on the values of `lhs` and `rhs`, this expression can evaluate to 0 or 1. A compiler cannot optimize away the computation of expressions with nonconstant values. It could, of course, optimize this to an addition followed by an overflow check, if the machine supports such a thing.
R..
Out of curiosity, have you been successful in getting a compiler to do this optimization? A quick test against a few compilers didn't turn up any that could do it.
Stephen Canon
@Channel72: Also importantly, none of the subexpressions can overflow. If `lhs` is an `int` and nonnegative, `INT_MAX - lhs` yields a value between 0 and `INT_MAX`, inclusive, and that's perfectly legal, so nothing about this triggers undefined behavior.
David Thornley
What I can't understand is why someone who is worried about int32 overflows on the 64bit machine will continue to use int32? And using this in 32bit mode will be quite inneficient.
ruslik
On x86_64, there's nothing inefficient about using 32-bit integers. Performance is identical to 64-bit ones. One motivation for using smaller-than-native-word-size types is that it's extremely efficient to handle overflow conditions or carry (for arbitrary-precision arithmetic) since the overflow/carry happens in a directly-accessible location.
R..
@Stephen: Oddly I can't get it to work at the moment either, but I've seen plenty of cases where gcc optimizes out unnecessary 64-bit arithmetic like this. I wonder if it's too stupid to figure out how to do it for signed values; I normally only use unsigned arithmetic for things like this.
R..
@R., @Steven: no the subtraction code that the OP gave is not correct, please see my answer. I also give a code there that just does it with at most two comparisons. Perhaps the compilers will do better with that.
Jens Gustedt
@R.: after looking into it more closely it is also not wonder that `gcc` doesn't get away with just one jump on overflow. In fact, the conditions for overflow and underflow are in general not symmetric: on nowadays architectures usually `INT_MIN == (-INT_MAX) - 1`. So the two bound checks *must* be different.
Jens Gustedt
@R.: after thinking it over once more, I am not even convinced that your proposal of blowing it up to a wider type is worth it. Just use the corresponding unsigned type.
Jens Gustedt
@Jens: Which unsigned approach? There are lots and I'm not sure which is most efficient for handling signed inputs.
R..
@Jens: As for gcc, x86 has a perfectly good `jo` instruction (jump on overflow) it should be using.
R..
+1  A: 

I think that this works:

int add(int lhs, int rhs) {
   volatile int sum = lhs + rhs;
   if (lhs != (sum - rhs) ) {
       /* overflow */
       //errno = ERANGE;
       abort();
   }
   return sum;
}

Using volatile keeps the compiler from optimizing away the test because it thinks that sum may have changed between the addition and the subtraction.

Using gcc 4.4.3 for x86_64 the assembly for this code does do the addition, the subtraction, and the test, though it stores everything on the stack and of unneeded stack operations. I even tried register volatile int sum = but the assembly was the same.

For a version with only int sum = (no volatile or register) the function did not do the test and did the addition using only one lea instruction (lea is Load Effective Address and is often used to do addition without touching the flags register).

Your version is larger code and has a lot more jumps, but I don't know which would be better.

nategoose
`register volatile int sum asm("eax");``sum = lhs + rhs;`yielded better assembly.
nategoose
-1 for misuse of `volatile` to mask undefined behavior. If it "works", you're still just "getting lucky".
R..
@R: If it doesn't work the compiler isn't implementing `volatile` correctly. All I was trying for was a simpler solution to a very common problem on an already answered question.
nategoose
Where it might fail, though, would be a system whose numeric representation wrap to lower values upon overflow for integers.
nategoose
That last comment should have a "didn't" or "doesn't" in it.
nategoose
+1  A: 

By me, the simpliest check would be checking the signs of the operands and of the results.

Let's examine sum: the overflow could occur in both directions, + or -, only when both operands have the same sign. And, obviosly, the overflow will be when the sign of the result won't be the same as the sign of the operands.

So, a check like this will be enough:

int a, b, sum;
sum = a + b;
if  (((a ^ ~b) & (a ^ sum)) & 0x80000000)
    detect_oveflow();

Edit: as Nils suggested, this is the correct if condition:

((((unsigned int)a ^ ~(unsigned int)b) & ((unsigned int)a ^ (unsigned int)sum)) & 0x80000000)

And since when the instruction

add eax, ebx 

leads to undefined behavior? There is no such thing in the Intel x86 instruction set refference..

ruslik
You're missing the point here. Your second line of code `sum = a + b` could yield undefined behavior.
Channel72
if you cast sum, a and b to unsigned during your test-addition your code will work btw..
Nils Pipenbrinck
It's undefined not because the program will crash or will behave differently. It's the exact thing the processor is doing to compute the OF flag. The standard is just trying to protect itself from non-standard cases, but it doesn't mean that you are not allowed to do this.
ruslik
@Nils yeah, i wanted to do that, but I thought that four `(usngined int)`s will make it much more unreadable. (you know, you first read it, and try it only if you liked it).
ruslik
Well you could make it more readable by taking out the useless `int` keywords. `unsigned int` is just an overly verbose way of writing `unsigned`.
R..
+1  A: 

How about:

int sum(int n1, int n2)
{
  int result;
  if (n1 >= 0)
  {
    result = (n1 - INT_MAX)+n2; /* Can't overflow */
    if (result > 0) return INT_MAX; else return (result + INT_MAX);
  }
  else
  {
    result = (n1 - INT_MIN)+n2; /* Can't overflow */
    if (0 > result) return INT_MIN; else return (result + INT_MIN);
  }
}

I think that should work for any legitimate INT_MIN and INT_MAX (symmetrical or not); the function as shown clips, but it should be obvious how to get other behaviors).

supercat
+1 for nice alternate approach that's perhaps more intuitive.
R..
A: 

The obvious solution is to convert to unsigned, to get the well-defined unsigned overflow behavior:

int add(int lhs, int rhs) 
{ 
   int sum = (unsigned)lhs + (unsigned)rhs; 
   if ((lhs >= 0 && sum < rhs) || (lhs < 0 && sum > rhs)) { 
      /* an overflow has occurred */ 
      abort(); 
   } 
   return sum;  
} 

This replaces the undefined signed overflow behavior with the implementation-defined conversion of out-of-range values between signed and unsigned, so you need to check your compiler's documentation to know exactly what will happen, but it should at least be well defined, and should do the right thing on any twos-complement machine that doesn't raise signals on conversions, which is pretty much every machine and C compiler built in the last 20 years.

Chris Dodd
You're still storing the result in `sum`, which is an `int`. That results in either an implementation-defined result or an implementation-defined signal being raised if the value of `(unsigned)lhs + (unsigned)rhs` is greater than `INT_MAX`.
R..
@R: that's the whole point -- the behavior is implementation defined, rather than undefined, so the implementation must document what it does, and do it consistently. A signal can only be raised if the implementation documents it, in which case it must always be raised and you can use that behavior.
Chris Dodd
+6  A: 

No, your 2nd code isn't correct, but you are close: if you set

int half = INT_MAX/2;
int half1 = half + 1;

the result of an addition is INT_MAX. (INT_MAX is always an odd number). So this is valid input. But in your routine you will have INT_MAX - half == half1 and you would abort. A false positive.

This error can be repaired by putting < instead of <= in both checks.

But then also your code isn't optimal. The following would do:

int add(int lhs, int rhs)
{
 if (lhs >= 0) {
  if (INT_MAX - lhs < rhs) {
   /* overflow has occurred */
   abort();
  }
 }
 else {
  if (rhs < INT_MIN - lhs) {
   /* overflow has occurred */
   abort();
  }
 }
 return lhs + rhs;
}

To see that this is valid, you have to symbolically add lhs on both sides of the inequalities, and this gives you exactly the arithmetical conditions that your result is out of bounds.

Jens Gustedt
+1 for catching subtle mistake we all missed!
R..