tags:

views:

520

answers:

5

Say I have an expression like this

short v = ( ( p[ i++ ] & 0xFF ) << 4 | ( p[ i ] & 0xF0000000 ) >> 28;

with p being a pointer to a dynamically allocated array of 32 bit integers.

When exactly will i be incremented? I noticed that the above code delivers a different value for v than the following code:

short v = ( p[ i++ ] & 0xFF) <<   4;
v |= ( p[ i ] & 0xF0000000 ) >>  28;

My best guess for this behaviour is that i is not incremented before the right side of the above | is evaluated.

Any insight would be appreciated!

Thanks in advance,

\Bjoern

+9  A: 

The first example is undefined behavior. You cannot read a variable more than once in an expression that also changes the value of the variable. See this (among other places on the Internet).

Tyler McHenry
You can't even read it one in all cases. (a[b] = (b = 5) for instance is undefined)
AProgrammer
That actually reads it twice. The expression (b = 5) has the value of b, so b is read once as an array index on the left side, and then again as the value of the expression (b = 5) on the right.
Tyler McHenry
It isn't read. The value of b = 5 is the value written, not a value read after writen 5 to it. (It could makes a difference if b is volatile).
AProgrammer
@AProgrammer: The problem is not caused by read accesses though. The problem is that their is a write within the expression and that the order of evaluation of sub expressions is undefined. (If there was a single assignment in the expression their would not be a problem).
Martin York
+3  A: 

Sometimes before the end of the expression.

It is undefined to read an object which is also modified for something else than determining the new value as it is undefined to write an object twice. And you may even get inconsistant value (i.e. reading something which isn't the old nor the new value).

AProgrammer
+11  A: 

i is incremented sometime before the next sequence point. The only sequence point in the expression you have given is at the end of the statement - so "sometime before the end of the statement" is the answer in this case.

That's why you shouldn't both modify an lvalue and read its value without an intervening sequence point - the result is indeterminate.

The &&, ||, comma and ? operators introduce sequence points, as well as the end of an expression and a function call (the latter means that if you do f(i++, &i), the body of f() will see the updated value if it uses the pointer to examine i).

caf
You can modify an object and read it if you read it to determine the new value. In that case, you can read it several times (b = b + b; is valid). Note that you can't do f(i++, i), the comma here isn't the comma operator.
AProgrammer
+1  A: 

Your expression has undefined behavior, see for example this about sequence points in C and C++ statements.

Nikolai N Fetissov
+8  A: 

The problem is order of evaluation:
The C++ standard does not define the order of evaluation of sub expressions. This is done so that the compiler can be as aggressive as possible in optimizations.

Lets break it down:

           a1                        a2
v = ( ( p[ i++ ] & 0xFF ) << 4 | ( p[ i ] & 0xF0000000 ) >> 28;

-----
(1) a1 = p[i]
(2) i  = i + 1 (i++)       after (1)      
(3) a2 = p[i]
(4) t3 = a1 & 0xFF         after (1)
(5) t4 = a2 & 0xF0000000   after (3)
(6) t5 = t3 << 4           after (4)
(7) t6 = t4 >> 28          after (5)
(8) t7 = t5 | t6           after (6) and (7)
(9) v  = t7                after (8)

Now the compiler is free to re-arrange thus sub expressions as long as the above 'after' clauses are not violated. So one quick easy optimization is move 3 up one slot and then do common expression removal (1) and (3) (now beside each other) are the same and thus we can eliminate (3)

But the compiler does not have to do the optimization (and is probably better than me at it and has other tricks up its sleeve). But you can see how the value of (a1) will always be what you expect, but the value of (a2) will depend on what order the compiler decides to do the other sub-expressions.

The only guarantees that you have that the compiler can not move sub-expressions past a sequence point. Your most common sequence point is ';' (the end of the statement). There are others, but I would avoid using this knowledge as most people don't know the compiler workings that well. If you write code that uses sequence point tricks then somebody may re-factor the code to make it look more readable and now your trick has just turned into undefined be-behavior.

short v = ( p[ i++ ] & 0xFF) <<   4;
v |= ( p[ i ] & 0xF0000000 ) >>  28;

-----
(1) a1 = p[i]
(2) i  = i + 1 (i++)       after (1)      
(4) t3 = a1 & 0xFF         after (1)
(6) t5 = t3 << 4           after (4)
(A) v = t5                 after (6)
------ Sequence Point
(3) a2 = p[i]
(5) t4 = a2 & 0xF0000000   after (3)
(7) t6 = t4 >> 28          after (5)
(8) t7 = v | t6            after (7)
(9) v  = t7                after (8)

Here everything is well defined as the write to i is sued in place and not re-read in the same expression.

Simple rule. don't use ++ or -- operators inside a larger expression. Your code looks just as readable like this:

++i; // prefer pre-increment (it makes no difference here, but is a useful habit)
v = ( ( p[ i ] & 0xFF ) << 4 | ( p[ i ] & 0xF0000000 ) >> 28;

See this article for detailed explanation of evaluation order:
http://stackoverflow.com/questions/367633/what-are-all-the-common-undefined-behaviour-that-c-programmer-should-know-about/367690#367690

Martin York
+1: Very good explanation! But your last suggestion is wrong! It has to be:v = ( ( p[ i ] ++i;
rstevens
@rstevens. My last expression depends on what 'Bjoern' was trying to do, since his current code has undefined behavior deciding what the code 'has to be' will depend on how 'Bjoern' interpreted what i++ was doing, and thus without further context there are multiple different veriants that may be potentially correct. I see my interpretation as one of these veriants and I am sure 'Bjoern' can extrapolate what he wants to do from there.
Martin York
PS. I agree that the ++i could potentially go after the expression, but I disagree that the second access to p needs the +1 p[i +1] as in your 'has to be' expression. Thus your answer 'has to be' wrong. ;-) See absolutes are rarly correct.
Martin York