views:

226

answers:

4

In C bitwise left shift operation invokes Undefined Behaviour when the left side operand has negative value.

Relevant quote from ISO C99 (6.5.7/4)

The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1× 2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1× 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

But in C++ the behaviour is well defined.

ISO C++-03 (5.8/2)

The value of E1 << E2 is E1 (interpreted as a bit pattern) left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 multiplied by the quantity 2 raised to the power E2, reduced modulo ULONG_MAX+1 if E1 has type unsigned long, UINT_MAX+1 otherwise. [Note: the constants ULONG_MAXand UINT_MAXare defined in the header ). ]

That means

int a = -1, b=2, c;
c= a << b ;

invokes Undefined Behaviour in C but the behaviour is well defined in C++.

What forced the ISO C++ committee to consider that behaviour well defined as opposed to the behaviour in C?

On the other hand the behaviour is implementation defined for bitwise right shift operation when the left operand is negative, right?

My question is why does left shift operation invoke Undefined Behaviour in C and why does right shift operator invoke just Implementation defined behaviour?

P.S : Please don't give answers like "It is undefined behaviour because the Standard says so". :P

A: 

A lot of these kind of things are a balance between what common CPUs can actually support in a single instruction and what's useful enough to expect compiler-writers to guarantee even if it takes extra instructions. Generally, a programmer using bit-shifting operators expects them to map to single instructions on CPUs with such instructions, so that's why there's undefined or implementation behaviour where CPUs had various handling of "edge" conditions, rather than mandating a behaviour and having the operation be unexpectedly slow. Keep in mind that the additional pre/post or handling instructions may be made even for the simpler use cases. undefined behaviour may have been necessary where some CPUs generated traps/exceptions/interrupts (as distinct from C++ try/catch type exceptions) or generally useless/inexplicable results, while if the set of CPUs considered by the Standards Committee at the time all provided at least some defined behaviour, then they could make the behaviour implementation defined.

Tony
+7  A: 

The paragraph you copied is talking about unsigned types. The behavior is undefined in C++. From the last C++0x draft:

The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E^2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1×2E^2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

EDIT: got a look at C++98 paper. It just doesn't mention signed types at all. So it's still undefined behavior.

Right-shift negative is implementation defined, right. Why? In my opinion: It's easy to implementation-define because there is no truncation from the left issues. When you shift left you must say not only what's shifted from the right but also what happens with the rest of the bits e.g. with two's complement representation, which is another story.

ybungalobill
@ybungalobill : This paragraph is neither present in `C++03` nor in `C++98`.
Prasoon Saurav
@Prasoon Saurav: That paragraph is part of the current C++0x final draft, and it shows that the C++ Standard Committee considered that to be a flaw in the current standard and fixed it by actually stating that it is undefined --instead of implicitly not defining the result.
David Rodríguez - dribeas
@David "EDIT: got a look at C++98 paper. It just doesn't mention signed types at all. So it's still undefined behavior." I don't agree with that interpretation. "The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled." is a clear statement, and does not exclude signed types. I think they just overlooked the case of signed negative operands.
Johannes Schaub - litb
A: 

To answer your real question as stated in the title: as for any operation on a signed type, this provokes UB if the result of the mathematical operation doesn't fit in the target type (under- or overflow). Signed integer types are designed like that.

For the left shift operation if the value is positive or 0, the definition of the operator as a multiplication with a power of 2 makes sense, so everything is ok, unless the result overflows, nothing surprising.

If the value is negative, you could have the same interpretation of multiplication with a power of 2, but if you just think in terms of bit shift, this would be perhaps surprising. Obviously the standards committee wanted to avoid such ambiguity.

My conclusion:

  • if you want to do real bit pattern operations use unsigned types
  • if you want to multiply a value (signed or not) by a power of two, do just that, something like

    i * (1u << k)

your compiler will transform this into decent assembler in any case.

Jens Gustedt
A: 

In C bitwise left shift operation invokes Undefined Behaviour when the left side operand has negative value. [...] But in C++ the behaviour is well defined. [...] why [...]

The easy answer is: Becuase the standards say so.

A longer answer is: It has probably something to do with the fact that C and C++ both allow other representations for negative numbers besides 2's complement. Giving fewer guarantees on what's going to happen makes it possible to use the languages on other hardware including obscure and/or old machines.

For some reason, the C++ standardization committee felt like adding a little guarantee about how the bit representation changes. But since negative numbers still may be represented via 1's complement or sign+magnitude the resulting value possibilities still vary.

Assuming 16 bit ints, we'll have

 -1 = 1111111111111111  // 2's complement
 -1 = 1111111111111110  // 1's complement
 -1 = 1000000000000001  // sign+magnitude

Shifted to the left by 3, we'll get

 -8 = 1111111111111000  // 2's complement
-15 = 1111111111110000  // 1's complement
  8 = 0000000000001000  // sign+magnitude

What forced the ISO C++ committee to consider that behaviour well defined as opposed to the behaviour in C?

I guess they made this guarantee so that you can use << appropriately when you know what you're doing (ie when you're sure your machine uses 2's complement).

On the other hand the behaviour is implementation defined for bitwise right shift operation when the left operand is negative, right?

I'd have to check the standard. But you may be right. A right shift without sign extension on a 2's complement machine isn't particularly useful. So, the current state is definitely better than requiring vacated bits to be zero-filled because it leaves room for machines that do a sign extensions -- even though it is not guaranteed.

sellibitze