views:

666

answers:

13

In case of integer overflows what is the result of (unsigned int) * (int) ? unsigned or int?

I was auditing the following function, and suddenly I came out with that question. In the below function, it is vulnerable at line 17.

1.  // Create a character array and initialize it with init[] 
2.  // repeatedly. The size of this character array is specified by 
3.  // w*h.
4.  char *function4(unsigned int w, unsigned int h, char *init)
5.  {
6.      char *buf;
7.      int i;
8.  
9.      if (w*h > 4096)
10.         return (NULL);
11. 
12.     buf = (char *)malloc(4096+1);
13.     if (!buf)
14.        return (NULL);
15. 
16.     for (i=0; i<h; i++)
17.         memcpy(&buf[i*w], init, w);
18. 
19.     buf[4096] = '\0';
20. 
21.     return buf;
22. }

Consider both w and h are very large unsigned integers. The multiplication at line 9 have a chance to pass the validation.

Now the problem is at line 17. Multiply int i with unsigned int w: if the result is int, it is possible that the product is negative, resulting in accessing a position that is before buf. If the result is unsigned int, the product will always be positive, resulting in accessing a position that is after buf.

It's hard to write code to justify this: int is too large. Does anyone has ideas on this?


Added:

The question here is not to discuss how bad the function is, or how to improve the function to make it better.

The question is to ask:

`what is the result of (unsigned int) * (int) ? unsigned or int?`

Is there any documentation that specifies this? I am searching for it.


Added:

I guess there is no need to discuss whether (unsigned int) * (int) produces unsigned int or int. Because from C's perspective, they are bytes. Therefore, the following code holds:

unsigned int x = 10;
int y = -10;

printf("%d\n", x * y);  // print x * y in signed integer
printf("%u\n", x * y);  // print x * y in unsigned integer

Therefore, it does not matter what type the multiplication returns. It matters that whether the consumer function takes int or unsigned.

So, now the question becomes,

"Does the array indexer `somearray[value]` takes an `int` as input, 
or an `unsigned ` as input?"


A: 

Why not just declare i as unsigned int? Then the problem goes away.

In any case, i*w is guaranteed to be <= 4096, as the code tests for this, so it's never going to overflow.

Steve Melnikoff
w*h may overflow to be less than 4096, though
rampion
Wrong, the code did not ensure than `i*w` will be <= 4096 because the unsigned multiplication will wrap modulo `UINT_MAX+1`.
R..
+2  A: 

Ensure that w * h doesn't overflow by limiting w and h.

starblue
Rather, use `if (!h || w > 4096/h) return NULL;`.
R..
A: 

memcpy(&buf[i*w > -1 ? i*w < 4097? i*w : 0 : 0], init, w); I don't think the triple calculation of i*w does degrade the perfomance)

+2  A: 

do the w*h calculation in long long, check if bigger than MAX_UINT

EDIT : alternative : if overflown (w*h)/h != w (is this always the case ?! should be, right ?)

qwerty
that would fail if h is 0
Spudd86
Bad advice. Try `if (!h || w > 4096/h) return NULL;`.
R..
A: 

To actually answer your question, without specifying the hardware you're running on, you don't know, and in code intended to be portable, you shouldn't depend on any particular behavior.

Charlie Martin
Unsigned arithmetic is defined in C and C++. Given the maximum unsigned int value (which is available in a header somewhere), you can predict what happens precisely.
David Thornley
A: 

w*h could overflow if w and/or h are sufficiently large and the following validation could pass.

9.      if (w*h > 4096)
10.         return (NULL);

On int , unsigned int mixed operations, int is elevated to unsigned int, in which case, a negative value of 'i' would become a large positive value. In that case

&buf[i*w]

would be accessing a out of bound value.

Indeera
A: 

Unsigned arithmetic is done as modular (or wrap-around), so the product of two large unsigned ints can easily be less than 4096. The multiplication of int and unsigned int will result in an unsigned int (see section 4.5 of the C++ standard).

Therefore, given large w and a suitable value of h, you can indeed get into trouble.

Making sure integer arithmetic doesn't overflow is difficult. One easy way is to convert to floating-point and doing a floating-point multiplication, and seeing if the result is at all reasonable. As qwerty suggested, long long would be usable, if available on your implementation. (It's a common extension in C90 and C++, does exist in C99, and will be in C++0x.)

David Thornley
Can you tell me where I can find a online copy of C++ standard?
yinyueyouge
I got it from http://webstore.ansi.org/ - search for C++ standard. It costs $30. ANSI and other standardization organizations typically finance themselves by charging for copies of their standards.
David Thornley
Ridiculous advice - convert to floating point?!? Try `if (!h || w > 4096/h) return NULL;`.
R..
+1  A: 

2 changes make it safer:

if (w >= 4096 || h >= 4096 || w*h > 4096)  return NULL;

...

unsigned i;

Note also that it's not less a bad idea to write to or read from past the buffer end. So the question is not whether i*w may become negative, but whether 0 <= i*h +w <= 4096 holds.

So it's not the type that matters, but the result of h*i. For example, it doesn't make a difference whether this is (unsigned)0x80000000 or (int)0x80000000, the program will seg-fault anyway.

Ingo
+1  A: 

To answer your question: the type of an expression multiplying an int and an unsigned int will be an unsigned int in C/C++.

To answer your implied question, one decent way to deal with possible overflow in integer arithmetic is to use the "IntSafe" set of routines from Microsoft:

http://blogs.msdn.com/michael_howard/archive/2006/02/02/523392.aspx

It's available in the SDK and contains inline implementations so you can study what they're doing if you're on another platform.

Michael Burr
+2  A: 

The type of w*i is unsigned in your case. If I read the standard correctly, the rule is that the operands are converted to the larger type (with its signedness), or unsigned type corresponding to the signed type (which is unsigned int in your case).

However, even if it's unsigned, it doesn't prevent the wraparound (writing to memory before buf), because it might be the case (on i386 platform, it is), that p[-1] is the same as p[-1u]. Anyway, in your case, both buf[-1] and buf[big unsigned number] would be undefined behavior, so the signed/unsigned question is not that important.

Note that signed/unsigned matters in other contexts - eg. (int)(x*y/2) gives different results depending on the types of x and y, even in the absence of undefined behaviour.

I would solve your problem by checking for overflow on line 9; since 4096 is a pretty small constant and 4096*4096 doesn't overflow on most architectures (you need to check), I'd do

if (w>4096 || h>4096 || w*h > 4096)
     return (NULL);

This leaves out the case when w or h are 0, you might want to check for it if needed.

In general, you could check for overflow like this:

if(w*h > 4096 || (w*h)/w!=h || (w*h)%w!=0)
jpalecek
+2  A: 

In C/C++ the p[n] notation is really a shortcut to writting *(p+n), and this pointer arithmetics take into account the sign. So p[-1] is valid and refer to the value immediately before *p.

So the sign really matters here, the result of arithmetic operator with integer follow a set of rules defined by the standard, and this is called integer promotions. Check this https://www.securecoding.cert.org/confluence/display/seccode/INT02-C.+Understand+integer+conversion+rules

Ismael
+1  A: 

For C, refer to "Usual arithmetic conversions" (C99: Section 6.3.1.8, ANSI C K&R A6.5) for details on how the operands of the mathematical operators are treated.

In your example the following rules apply:

C99:

Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.

Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.

ANSI C:

Otherwise, if either operand is unsigned int, the other is converted to unsigned int.

Trent
A: 

There are 3 paragraphs in the current C1X draft on calculating (UNSIGNED TYPE1) X (SIGNED TYPE2) in 6.3.1.8 Usual arithmetic coversions, N1494,

WG 14: C - Project status and milestones

Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.

Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.

Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.

So if a is unsigned int and b is int, parsing of (a * b) should generate code (a * (unsigned int)b). Will overflow if b < 0 or a * b > UINT_MAX.

If a is unsigned int and b is long of greater size, (a * b) should generate ((long)a * (long)b). Will overflow if a * b > LONG_MAX or a * b < LONG_MIN.

If a is unsigned int and b is long of the same size, (a * b) should generate ((unsigned long)a * (unsigned long)b). Will overflow if b < 0 or a * b > ULONG_MAX.

On your second question about the type expected by "indexer", the answer appears "integer type" which allows for any (signed) integer index.

6.5.2.1 Array subscripting

Constraints

1 One of the expressions shall have type ‘‘pointer to complete object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.

Semantics

2 A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).

It is up to the compiler to perform static analysis and warn the developer about possibility of buffer overrun when the pointer expression is an array variable and the index may be negative. Same goes about warning on possible array size overruns even when the index is positive or unsigned.

eel ghEEz