views:

95

answers:

1

I recently ran into an issue where I wasn't getting the numerical result I expected. I tracked it down to the problem that is illustrated by the following example:

#include <stdio.h>

int main()
{
  double sample = .5;
  int a = (int)(sample * (1 << 31));
  int b = (int)(sample * (1 << 23) * (1 << 8));
  printf("a = %#08x, b = %#08x\n", a, b);
}
// Output is: a = 0xc0000000, b = 0x40000000

Why is the result of multiplying by (1 << 31) different than the result of multiplying by (1 << 23) * (1 << 8)? I expected the two to give the same answer but they don't.

I should note that all my floating point values are in the range [-1, 1).

+13  A: 

You are apparently expecting identical results since you assume that to multiply by (1 << 31) is the same as to multiply by (1 << 23) and then by (1 << 8). In general case they are not the same. You are performing the (1 << 31) calculation in a signed int domain. If your platform uses 32-bit ints, the (1 << 31) expression overflows, while both (1 << 23) and (1 << 8) are not overflowing. This immediately means that the result of the first multiplication is unpredictable.

In other words, it doesn't make any sense to do (1 << 31) on a platform that has only 31 bits in the value representation of int type. You need at least 32 value-forming bits to meaningfully calculate (1 << 31).

If you want your (1 << 31) to make sense, calculate in it the unsigned domain: (1u << 31), (1u << 23) and (1u << 8). That should give you consistent results. Alternatively, you can use a larger signed integer type.

AndreyT
Very clear explanation, thank you.
sbooth