views:

966

answers:

9

There's been some debate going on in this question about whether the following code is legal C++:

std::list<item*>::iterator i = items.begin();
while (i != items.end())
{
    bool isActive = (*i)->update();
    if (!isActive)
    {
        items.erase(i++);  // *** Is this undefined behavior? ***
    }
    else
    {
        other_code_involving(*i);
        ++i;
    }
}

The problem here is that erase() will invalidate the iterator in question. If that happens before i++ is evaluated, then incrementing i like that is technically undefined behavior, even if it appears to work with a particular compiler. One side of the debate says that all function arguments are fully evaluated before the function is called. The other side says, "the only guarantees are that i++ will happen before the next statement and after i++ is used. Whether that is before erase(i++) is invoked or afterwards is compiler dependent."

I opened this question to hopefully settle that debate.

+25  A: 

Quoth the C++ standard 1.9.16:

When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. (Note: Value computations and side effects associated with the different argument expressions are unsequenced.)

So it would seem to me that this code:

foo(i++);

is perfectly legal. It will increment i and then call foo with the previous value of i. However, this code:

foo(i++, i++);

yields undefined behavior because paragraph 1.9.16 also says:

If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

Kristo
It ain't undefined here. It will do both, but in a undefined order.
Migol
The second one is undefined, yeah. Modifying a variable twice in the same expression is undefined, not just the order in which they're evaluated. In practical terms, both may increment the original i, rather than the one updated by the other increment.Nothing wrong with the first one.
jalf
+1 jalf. (Pad, pad...)
j_random_hacker
Wouldn't foo(++i, ++i) be more undefined ?
Benoît
How can it be "more undefined"?
Ed Swangren
@Ed hehe exactly, how can 0 be more of a 0
Ólafur Waage
+1: for the reference to the standard.
J.F. Sebastian
+2  A: 

It's perfectly OK. The value passed would be the value of "i" before the increment.

DasBoot
+6  A: 

To build on Kristo's answer,

foo(i++, i++);

yields undefined behavior because the order that function arguments are evaluated is undefined (and in the more general case because if you read a variable twice in an expression where you also write it, the result is undefined). You don't know which argument will be incremented first.

int i = 1;
foo(i++, i++);

might result in a function call of

foo(2, 1);

or

foo(1, 2);

or even

foo(1, 1);

Run the following to see what happens on your platform:

#include <iostream>

using namespace std;

void foo(int a, int b)
{
    cout << "a: " << a << endl;
    cout << "b: " << b << endl;
}

int main()
{
    int i = 1;
    foo(i++, i++);
}

On my machine I get

$ ./a.out
a: 2
b: 1

every time, but this code is not portable, so I would expect to see different results with different compilers.

Bill the Lizard
It might also result in foo(2,2). You don't know which value of i each increment uses, and it might use the original value in both. In theory, it could result in foo(42, -263) as well, but we're probably not likely to see that particular result from a real world compiler. ;)
jalf
oh wait, aren't we off by one in all the examples? Shouldn't it be foo(1,1), foo(1,2) and foo(2,1)?
jalf
Are you sure? Won't the function call always be foo(1,1)? The question is what value i will have afterwards.
Steve Rowe
@Steve Rowe: Suppose the leftmost incrmeent is evaluated first, yielding 1 as the first argument to the function, and setting i to 2. Then the rightmost increment is evaluated, yielding 2 as the second argument, and setting the final i to 3. Then foo gets called with (1,2), and i=3 afterwards.
jalf
@Bill: No, they don't need to have different values, It's undefined (http://www.research.att.com/~bs/bs_faq2.html#evaluation-order). There's no guarantee that they'll be evaluated sequentially (or at all, if we want to be pedantic).
jalf
@jalf: Thanks for catching both my mistakes.
Bill the Lizard
I believe Steve is right. There can never be (2,2) only (1,1) perhaps.
Milan Babuškov
@Milan: Fixed that, thanks.
Bill the Lizard
A: 

To build on Bill the Lizard's answer:

int i = 1;
foo(i++, i++);

might also result in a function call of

foo(1, 1);

(meaning that the actuals are evaluated in parallel, and then the postops are applied).

-- MarkusQ

MarkusQ
I find that impossible.
Milan Babuškov
I think it was supposed to be foo(1,1). Bill had an off by one error earlier, and I think MarkusQ just copied that in his post
jalf
MarkusQ
It might also result in the computer turning into a stuffed panda, according to the standard. Undefined is undefined, and arguing about what one particular implementation is likely to do is pointless.
David Thornley
+2  A: 

To build on MarkusQ's answer: ;)

Or rather, Bill's comment to it:

(Edit: Aw, the comment is gone again... Oh well)

They're allowed to be evaluated in parallel. Whether or not it happens in practice is technically speaking irrelevant.

You don't need thread parallelism for this to occur though, just evaluate the first step of both (take the value of i) before the second (increment i). Perfectly legal, and some compilers may consider it more efficient than fully evaluating one i++ before starting on the second.

In fact, I'd expect it to be a common optimization. Look at it from an instruction scheduling point of view. You have the following you need to evaluate:

  1. Take the value of i for the right argument
  2. Increment i in the right argument
  3. Take the value of i for the left argument
  4. Increment i in the left argument

But there's really no dependency between the left and the right argument. Argument evaluation happens in an unspecified order, and need not be done sequentially either (which is why new() in function arguments is usually a memory leak, even when wrapped in a smart pointer) It's also undefined what happens when you modify the same variable twice in the same expression. We do have a dependency between 1 and 2, however, and between 3 and 4. So why would the compiler wait for 2 to complete before computing 3? That introduces added latency, and it'll take even longer than necessary before 4 becomes available. Assuming there's a 1 cycle latency between each, it'll take 3 cycles from 1 is complete until the result of 4 is ready and we can call the function.

But if we reorder them and evaluate in the order 1, 3, 2, 4, we can do it in 2 cycles. 1 and 3 can be started in the same cycle (or even merged into one instruction, since it's the same expression), and in the following, 2 and 4 can be evaluated. All modern CPU's can execute 3-4 instructions per cycle, and a good compiler should try to exploit that.

jalf
What's the point of analyzing what a compiler is likely to do with something undefined? If you want speed, wouldn't it work better just to emit no code for undefined expressions?
David Thornley
What the compiler does with undefined expressions doesn't matter speedwise or otherwise, because nothing undefined should be in your code in the first place. If your code invokes undefined behavior, you've already lost. ;)
jalf
The point in my post was simply to illustrate a way in which the expression could yield a "surprising" value, and a reason why the compiler might actually choose to do this instead of the more predictable results. Of course relying on any of this would be stupid. :)
jalf
+3  A: 

The standard says the side effect happens before the call, so the code is the same as:

std::list<item*>::iterator i_before = i;

i = i_before + 1;

items.erase(i_before);

rather than being:

std::list<item*>::iterator i_before = i;

items.erase(i);

i = i_before + 1;

So it is safe in this case, because list.erase() specifically doesn't invalidate any iterators other than the one erased.

That said, it's bad style - the erase function for all containers returns the next iterator specifically so you don't have to worry about invalidating iterators due to reallocation, so the idiomatic code:

i = items.erase(i);

will be safe for lists, and will also be safe for vectors, deques and any other sequence container should you want to change your storage.

You also wouldn't get the original code to compile without warnings - you'd have to write

(void)items.erase(i++);

to avoid a warning about an unused return, which would be a big clue that you're doing something odd.

Pete Kirkham
erase() returns an iterator for sequence containers, but not for associative containers like set and map.
bk1e
Added 'sequence' to the post. Some implementations do have the return, but it's not standard.
Pete Kirkham
In the first example, it should be items.erase(i_before);
jmucchiello
Surely you'd need to set warning levels very high to be warned about unused return values? Have to confess I've never bothered with that -- it requires prepending "(void)" to a helluva lot of perfectly valid code. Or am I missing something?
j_random_hacker
Probably not that much valid code; I don't find in onerous and have found otherwise hidden bugs due to it.
Pete Kirkham
+1  A: 
Mr.Ree
+1, but I had to read the standard carefully to convince myself that this was OK. Am I right in thinking that, if bar() was instead a global function returning say int, f was an int, and those invocations were connected by say "^" instead of ".", then any of A, C and D could report "0"?
j_random_hacker
Responded by amending original post. (It took more than 300 chars.)
Mr.Ree
A: 

Sutter's Guru of the Week #55 (and the corresponding piece in "More Exceptional C++") discusses this exact case as an example.

According to him, it is perfectly valid code, and in fact a case where trying to transform the statement into two lines:

items.erase(i);
i++;

does not produce code that is semantically equivalent to the original statement.

Boojum
A: 

* Argument evaluation happens in an unspecified order, and need not be done sequentially either (which is why new() in function arguments is usually a memory leak, even when wrapped in a smart pointer)*

Could someone explain in a bit more detail what is meant by this quote above?

i.e. is the following a potential memory leak and why :

f( boost::shared_ptr( new T ) );

Andy
Please post this as a question, not an answer to a different question.
Kristo