tags:

views:

3330

answers:

13

Say like a[i] = i++;

A: 

The only type for which C++ guarantees a size is char. And the size is 1. The size of all other types is platform dependent

JaredPar
Isn't that what <cstdint> is for? It defines types such as uint16_6 et cetera.
Jasper Bekkers
Yes, but the size of most types, say long, is not well defined.
JaredPar
also cstdint isn't part of the current c++ standard yet. see boost/stdint.hpp for a currently portable solution.
Evan Teran
That's not undefined behaviour. The standard says that conforming platform defines the sizes, rather than the standard defining them.
Daniel Earwicker
Also not that the standard does not define how much 1 byte is. It is at least 8 bits, but anything above that is allowed, so a C++-byte is not necessarily equal to a real life byte / anyways, I vote you up as this does not deserve a down-vote.
phresnel
+2  A: 

Maybe what C++ pitfalls should i avoid will help.

gimel
+71  A: 

By going through the C++ standard, I found that the following actions will yield undefined behavior. My list does not include the use of the C++ standard library.

  • Dereferencing a NULL pointer
  • Dereferencing a pointer returned by a request for a zero-size object
  • Using pointers to objects whose lifetime has ended (for instance, stack allocated objects or deleted objects)
  • Depending on the value of an uninitialized automatic variable
  • A zero second operand to integer / and % operators
  • Performing pointer arithmetic that yields a result outside the boundaries (+1) of an array.
  • Dereferencing the pointer to (after) the end of an array.
  • Converting a floating point value to a value that can't be represented by the target type
  • Evaluating an expression whose result is not in the range of the corresponding types
  • Converting pointers to objects of incompatible types
  • Attempting to modify a string literal or other const object during its lifetime
  • Not returning a value from a value-returning function (directly or by flowing off from a try-block)
  • The value of any object of type other than volatile or sig_atomic_t at the receipt of a signal
  • A non-empty file that doesn't end with a newline, or ends with a backslash
  • Preprocessor numeric values that can't be represented by a long int
  • A backslash followed by a character that is not part of the specified escape codes in a character or string constant.
  • Concatenating a narrow with a wide string literal during preprocessing
  • Multiple different definitions for the same entity (class, template, enumeration, inline function, static member function, etc.)
  • Calling exit during the destruction of a program with static storage duration
  • Cascading destructions of objects with static storage duration
  • Shifting values by a negative amount
  • The result of assigning to partially overlapping objects
  • Recursively re-entering a function during the initialization of its static objects
  • Making virtual function calls to pure virtual functions of an object from its constructor or destructor
  • Referring to nonstatic members of objects that have not been constructed or have already been destructed
  • Infinite recursion in the instantiation of templates
  • Dynamically generating the defined token in a #if expression
  • Preprocessing directive on the left side of a function-line macro definition
Diomidis Spinellis
Why would you community wiki this answer!? You deserve the rep for it! Good work!
Simucal
So other people could add to the list.
SLaks
+13  A: 

The order that function parameters are evaluated is unspecified.
The only requirement is that all parameters must be fully evaluated before the function is called.

// The simple obvious one.
callFunc(getA(),getB());

// Is this 
int a = getA();
int b = getB();
callFunc(a,b);

// or
int b = getB();
int a = getA();
callFunc(a,b);

// The answer is either. Up to the compiler.
// The answer can matter depending on the side effects.
Martin York
That is very nasty.
JaredPar
The order is unspecified, not undefined.
Rob Kennedy
I hate this one :) I lost a day of work once tracking down one of these cases... anyways learned my lesson and haven't fallen again fortunately
Robert Gould
@Rob: I would argue with you about the change in meaning here, but I know the standards committee is very picky on the exact definition of these two words. So I'll just change it :-)
Martin York
I got lucky on this one. I got bitten by it when I was in college and had a professor who took one look at it and told me my problem in about 5 seconds. No telling how much time I would have wasted debugging otherwise.
Bill the Lizard
wtf............
Longpoke
+2  A: 

Variables may only be updated once in an expression
(Technically once between sequence points).

int i =1;
i = ++i;

// Undefined. Assignment to i twice in the same expression.
Martin York
@Martin : Infact _at least_ once between two sequence points.
Prasoon Saurav
+11  A: 

The compiler is free to re-order the evaluation parts of an expression (assuming the meaning is unchanged).

From the original question:

a[i] = i++;

// This expression has three parts:
(a) a[i]
(b) i++
(c) Assign (b) to (a)

// (c) is guaranteed to happen after (a) and (b)
// But (a) and (b) can be done in either order.
// See n2521 Section 5.17
// (b) increments i but returns the original value.
// See n2521 Section 5.2.6
// Thus this expression can be written as:

int rhs  = i++;
int lhs& = a[i];
lhs = rhs;

// or
int lhs& = a[i];
int rhs  = i++;
lhs = rhs;

Double Checked locking. And one easy mistake to make.

A* a = new A("plop");

// Looks simple enough.
// But this can be split into three parts.
(a) allocate Memory
(b) Call constructor
(c) Assign value to 'a'

// No problem here:
// The compiler is allowed to do this:
(a) allocate Memory
(c) Assign value to 'a'
(b) Call constructor.
// This is because the whole thing is between two sequence points.

// So what is the big deal.
// Simple Double checked lock. (I know there are many other problems with this).
if (a == null) // (Point B)
{
    Lock   lock(mutex);
    if (a == null)
    {
        a = new A("Plop");  // (Point A).
    }
}
a->doStuff();

// Think of this situation.
// Thread 1: Reaches point A. Executes (a)(c)
// Thread 1: Is about to do (b) and gets unscheduled.
// Thread 2: Reaches point B. It can now skip the if block
//           Remember (c) has been done thus 'a' is not NULL.
//           But the memory has not been initialized.
//           Thread 2 now executes doStuff() on an uninitialized variable.

// The solution to this problem is to move the assignment of 'a'
// To the other side of the sequence point.
if (a == null) // (Point B)
{
    Lock   lock(mutex);
    if (a == null)
    {
        A* tmp = new A("Plop");  // (Point A).
        a = tmp;
    }
}
a->doStuff();

// Of course there are still other problems because of C++ support for
// threads. But hopefully these are addresses in the next standard.
Martin York
what is mean by sequence point?
yesraaj
http://en.wikipedia.org/wiki/Sequence_point
Martin York
Ooh... that's nasty, especially since I've seen that exact structure recommended in Java
Tom
Note that some compilers do define the behaviour in this situation. In VC++ 2005+, for example, if a is volatile, the needed memory bariers are set up to prevent instruction reordering so that double-checked locking works.
Eclipse
@Eclipse: Interesting.
Martin York
Martin York: <i>// (c) is guaranteed to happen after (a) and (b)</i> Is it? Admittedly in that particular example the only scenario where it could matter would be if 'i' was a volatile variable mapped to a hardware register, and a[i] (old value of 'i') was aliased to it, but is there any guarantee that the increment will happen before a sequence point?
supercat
@supercat: Yes it is guaranteed. Both sides of the `=` must be evaluated before it can be evaluated.
Martin York
@Martin York: The expression "i=i++;" is commonly given as an example of undefined behavior in C (I know the behavior is defined in some other languages like Java). If the side effects of the increment are not guaranteed to be over and done with by the time an "i=i++;" assignment occurs, why would they be guaranteed to be complete by the time "a[i]=i++;" occurs?
supercat
@supercat: OK I see where you are coming from. You are correct the effect of the operator ++ (on i) is not guaranteed to be done by that point.
Martin York
+3  A: 
const int i = 10; 
int *p =  const_cast<int*>( &i );
*p = 1234; //Undefined
yesraaj
+5  A: 

My favourite is "Infinite recursion in the instantiation of templates" because I believe it's the only one where the undefined behaviour occurs at compile time.

Daniel Earwicker
Done this before, but I don't see how its undefined. Its quite obvious your doing an infinite recursion in afterthought.
Robert Gould
The problem is that the compiler cannot examine your code and decide precisely whether it will suffer from infinite recursion or not. It's an instance of the halting problem. See: http://stackoverflow.com/questions/235984/the-halting-problem-in-the-field#334759
Daniel Earwicker
Yeah its definitely a halting problem
Robert Gould
it made my system crash because of swapping caused by too little memory.
Johannes Schaub - litb
Preprocessor constants that don't fit into an int is also compile time.
Joshua
A: 

The evaluation order of function parameters is arbitrary. (So do not place operations between brackets in a function call)

Ronny
A: 

Any program that uses multithreading.

This isn't undefined per se. It just isn't directly supported by the language. Butby the same logic you could say, "any program which uses Windows Edit Boxes" which is, of course, nonsense.
John Dibling
multithreading and "edit boxes" are not analogous in this context. I'm not sure whether or not multithreading is officially considered a source of undefined behavior, but clearly multithreading has the potential to break a program in ways that could be considered "undefined" or at least "unpredictable". The answer could have perhaps been stated less broadly, but it does make a good point.
nobar
+4  A: 

Besides undefined behaviour there is also equally nasty implementation-defined behaviour.

Undefined behaviour occurs when a program does something the result of which is not specified by the standard.

Implementation-defined behaviour is an action by a program the result of which is not defined by the standard, but which the implementation is required to document. An example is "Multibyte character literals" from this question.

Implementation-defined behaviour only bites you when you start porting (but upgrading to new version of compiler is also porting!)

Constantin
A: 

Namespace-level objects in different compilation units should never depend on each other for initialization, because their initialization order is undefined

yesraaj
+5  A: 

A guide to undefined behavior in C and C++.

rursw1
Very enjoyable blog serie. Especially the part about instruction reordering, I knew about it, of course, but never thought this could hamper debug :)
Matthieu M.