views:

868

answers:

7

perhaps it's a very stupid question but I'm having a hard time figuring this out =)

in C or C++ it is said that the maximum number a size_t (an unsigned int data type) can hold is the same as casting -1 to that data type. for example see http://stackoverflow.com/questions/1420982/invalid-value-for-sizet

Why?? I'm confused..

I mean, (talking about 32 bit ints) AFAIK the most significant bit holds the sign in a signed data type (that is, bit 0x80000000 to form a negative number). then, 1 is 0x00000001.. 0x7FFFFFFFF is the greatest positive number a int data type can hold.

then, AFAIK the binary representation of -1 int should be 0x80000001 (perhaps I'm wrong). why/how this binary value is converted to anything completely different (0xFFFFFFFF) when casting ints to unsigned?? or.. how is it possible to form a binary -1 out of 0xFFFFFFFF?

I have no doubt that in C: ((unsigned int)-1) == 0xFFFFFFFF or ((int)0xFFFFFFFF) == -1 is equally true than 1 + 1 == 2, I'm just wondering why.

thanks!

+3  A: 

That is two's complement encoding.

The main bonus is that you get the same encoding whether you are using an unsigned or signed int. If you subtract 1 from 0 the integer simply wraps around. Therefore 1 less than 0 is 0xFFFFFFFF.

Goz
Wow, slow internet at work killed me on this one... 4 answers while it was loading >_<
Adam W
+2  A: 

Take a look here and you will understand: Two's Complement

Konamiman
+19  A: 

It's called two's complement. To make a negative number, invert all the bits then add 1. So to convert 1 to -1, invert it to 0xFFFFFFFE, then add 1 to make 0xFFFFFFFF.

As to why it's done this way, Wikipedia says:

The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic.

Mark Ransom
P.S. I worked on a one's complement machine once. It was weird, having both positive zero and negative zero.
Mark Ransom
Just like floating point? *ducks*
Goz
It never occurred to me that floating point has negative zero, but I see that you are right: http://en.wikipedia.org/wiki/Signed_zero
Mark Ransom
The old Control Data Cyber series used one's complement. The machine language was odd: the standard comparison instruction would treat +0 as greater than -0, the standard equality instruction would treat them as unequal, but there was an "is it zero?" instruction that in essence returned true for -0 and +0 and false for anything else. Fortunately, I never had to deal much with that. Equally fortunately, I never had to do low-level text processing, since it fit 6-bit characters 10 to a machine word, and lowercase was handled by a mixed 6- and 12-bit representation.
David Thornley
The answer doesn't change even if you're on a ones' complement machine, or any other weird machine. For details, please see my answer. As a minor nit, it's "ones' complement", not "one's complement". From Knuth:A two's complement number is complemented with respect to a single power of 2, while a ones' complement number is complemented with respect to a long sequence of 1s. Indeed, there is also a "twos' complement notation," which has radix 3 and complementation with respect to (2...22)_3.
Alok
@David: Congratulations for guessing the machine where I first learned assembly. @Alok: I wasn't aware that the standard made guarantees like this, I've never used C or C++ on any machine that didn't use two's complement. I'm guessing they're extremely rare.
Mark Ransom
@Mark Ransom: Perhaps it's more accurate to say "make a number negative" or even "switch the sign of a number" than "make a negative number". As long as the highest bit is set, you've made a negative number.
Shmoopty
@David: on the Control Data's, you could cheat a bit:if you added zero to a number, it would turn negative zero into positive zero, which made comparisons simple and straightforward.
Jerry Coffin
A: 

Look up 2's complement: http://en.wikipedia.org/wiki/Two%27s%5Fcomplement

Joe
+20  A: 

C and C++ can run on many different architectures, and machine types. Consequently, they can have different representations of numbers: Two's complement, and Ones' complement being the most common. In general you should not rely on a particular representation in your program.

For unsigned integer types (size_t being one of those), the C standard (and the C++ standard too, I think) specifies precise overflow rules. In short, if SIZE_MAX is the maximum value of the type size_t, then the expression

(size_t) (SIZE_MAX + 1)

is guaranteed to be 0, and therefore, you can be sure that (size_t) -1 is equal to SIZE_MAX. The same holds true for other unsigned types.

Note that the above holds true:

  • for all unsigned types,
  • even if the underlying machine doesn't represent numbers in Two's complement. In this case, the compiler has to make sure the identity holds true.

Also, the above means that you can't rely on specific representations for signed types.

Edit: In order to answer some of the comments:

Let's say we have a code snippet like:

int i = -1;
long j = i;

There is a type conversion in the assignment to j. Assuming that int and long have different sizes (most [all?] 32-bit systems), the bit-patterns at memory locations for i and j are going to be different, because they have different sizes. The compiler makes sure that the values of i and j are -1.

Similarly, when we do:

size_t s = (size_t) -1

There is a type conversion going on. The -1 is of type int. It has a bit-pattern, but that is irrelevant for this example because when the conversion to size_t takes place due to the cast, the compiler will translate the value according to the rules for the type (size_t in this case). Thus, even if int and size_t have different sizes, the standard guarantees that the value stored in s above will be the maximum value that size_t can take.

If we do:

long j = LONG_MAX;
int i = j;

If LONG_MAX is greater than INT_MAX, then the value in i is implementation-defined (C89, section 3.2.1.2).

Alok
Voted up because you're the first person who noted that `(size_t)-1` is because of the arithmetic rules that C specifies for unsigned numbers, not because of the underlying representation. (By the way, `SIZE_MAX` is the macro).
caf
Thanks! I didn't want to use `SIZE_MAX` because it's not in C89, and also because I was trying to make a general point. Still, I think I could have mentioned it.
Alok
"even if the underlying machine doesn't represent numbers in two's complement" - given that two's complement is a way of representing _signed_ numbers, how would it ever apply to unsigned integers?
Pavel Minaev
If the standard guarantees that SIZE_T_MAX+1 == 0, does that automatically translate to (size_t)-1 == SIZE_T_MAX? I'm not sure it does, especially if the compiler needs to emit special code to check for that case. I'm not even sure I would guarantee (size_t) 0 - (size_t) 1 == SIZE_T_MAX. Maybe the standard has more to say on the subject than you've implied, I don't have it to hand.
Mark Ransom
Mark: "Unsigned integers shall obey the laws of arithmetic modulo 2**n where n is the number of bits in the value representation of that particular size of integer." [3.9.1/4, C++03]
Roger Pate
Edited my post for clarification. Also see Roger's reply.
Alok
@Roger, thanks for that. Seems to say (size_t)0 - (size_t)1 will give the desired answer. Still not convinced that it applies to (size_t)-1 however, since -1 is outside the range handled by size_t. P.S. should we argue about angels on the head of a pin after this?
Mark Ransom
@Alok, your edit better answered my question. To say it in simple words, no matter the internal representation of a binary number; it is irrelevant for C and its integer arithmetic rules. In conclusion, given that there are different hardware representations for negative integer, there is no way to manipulate them at the bit level in C to "make" negative numbers out of their bits, is this correct?
conejoroy
@conejoroy: It's possible to do it, but you have to be extremely careful, and the solution is most likely not going to be portable. First, you need to know the size of the underlying object (easy, sizeof will do that). You also need to know the endianness of the machine (also fairly easy, although there are some gotchas - and there might be systems with very "weird" endianness). Then, your compiler may add padding bits and/or have trap representations (bit-patterns that don't make any sense). So, it's best to not do it.
Alok
+1 for the clear answer. Liking how you stress the *value* aspect.
Johannes Schaub - litb
+5  A: 

Your first question, about why (unsigned)-1 gives the largest possible unsigned value is only accidentally related to two's complement. The reason -1 cast to an unsigned type gives the largest value possible for that type is because the standard says the unsigned types "follow the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer."

Now, for 2's complement, the representation of the largest possible unsigned value and -1 happen to be the same -- but even if the hardware uses another representation (e.g. 1's complement or sign/magnitude), converting -1 to an unsigned type must still produce the largest possible value for that type.

Jerry Coffin
A: 

Two's complement is very nice for doing subtraction just like addition :)

    11111110 (254 or -2)
   +00000001 (  1)
   ---------
    11111111 (255 or -1)

    11111111 (255 or -1) 
   +00000001 ( 1)
   ---------
   100000000 ( 0 + 256)
pmg