tags:

views:

985

answers:

13

An example of unspecified behavior in the C language the the order of evaluation of arguments to a function. It might be left to right or right to left, you just don't know. This would affect how foo(c++, c) or foo(++c, c) gets evaluated.

What other unspecified behavior is there that can surprise the unaware programmer?

A: 

Be sure to always initialize your variables before you use them! When I had just started with C, that caused me a number of headaches.

William Keller
+12  A: 

A language lawyer question. Hmkay.

My personal top3:

  1. violating the strict aliasing rule
  2. violating the strict aliasing rule
  3. violating the strict aliasing rule

    :-)

Edit Here is a little example that does it wrong twice:

(assume 32 bit ints and little endian)

float funky_float_abs (float *a)
{
  unsigned int * temp = (unsigned int*) a;
  temp &= 0x7fffffff;
  return *(float*)temp;
}

That code tries to get the absolute value of a float by bit-twiddeling with the sign bit directly in the representation of a float.

However, the result of creating a pointer to an object by casting from one type to another is not valid C. The compiler may assume that pointers to different types don't point to the same chunk of memory. This is true for all kind of pointers except void* and char* (sign-ness does not matter).

In the case above I do that twice. Once to get an int-alias for the float *a, and once to convert the value back to float.

There are two valid ways to do the same.

Use a char or void pointer during the cast. These always alias to anything, so they are safe.

float funky_float_abs (float *a)
{
  float temp_float = *a;
  // valid, because it's a char pointer. These are special.
  unsigned char * temp = (unsigned char *) a;
  temp[3] &= 0x7f;
  return temp_float;
}

Use memcopy. Memcpy takes void pointers, so it will force aliasing as well.

float funky_float_abs (float *a)
{
  int temp_int i;
  float temp_float result;
  memcpy (&i, a, sizeof (int));
  i &= 0x7fffffff;
  memcpy (&result, &i, sizeof (int));
  return result;
}

And while I'm at it: The next code has nothing to do with strict aliasing. It works in practice but it relies on undefined behaviour as well:

float funky_float_abs (float *a)
{
  union 
  {
     unsigned int i;
     float f;
  } cast_helper;

  cast_helper.f = *a;
  cast_helper.i &= 0x7fffffff;
  return cast_helper.f;
}

I'd say is safe to cast via unions even if undefined. If the compiler guys change this behaviour to be standard complient it will simply break to much code.

Nils Pipenbrinck
This sounds interesting...can you expand?
Benoit
The last one is define to be safe from the POV of aliasing, since it doesn't use pointers. The problem with it is that it relies on floats being IEEE-represented, which is implementation-dependent (3.9.1.8). So the result is consistent in a given implementation but might not be equal to fabs(*f).
Steve Jessop
fabs(*a), I mean.
Steve Jessop
Oh yeah, unless int is bigger than float, in which case the union hasn't been initialised properly, and the result isn't consistent. Sorry.
Steve Jessop
aehm. I mentioned that I assume 32 bit ints and little endian.Btw - the union usage is still undefined behaviour not because of the IEEE bit representation but simply because you are (in theory) not allowed to write into field f and read from field i.
Nils Pipenbrinck
Can you provide a good reference for that? The whole world says that unions are safe here, which of course doesn't mean they're right. I will have to trawl through the C and C++ standards tomorrow for my own satisfaction, if you don't save me the effort :-)
Steve Jessop
For a given implementation, you know the storage representations and the struct layout rules for POD. Only one member can be stored at a time, but it must be in its implementation-dependent storage representation. So you can predict the result of reading it, since aliasing doesn't apply. Or not?
Steve Jessop
onebyone, it's undefined behavior even if the implementation uses ieee. the point is it reads from a different member that was last written to.
Johannes Schaub - litb
http://www.csci.csusb.edu/dick/c++std/cd2/basic.html#basic.lval bullet 15 seems to imply that type punning through a union is safe. The wording in the c standard is identical.
Greg Rogers
the C99 standard allows type punning through unions; see footnote 82, which was added with TC3: "If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation."
Christoph
+10  A: 

My favorite is this:

// what does this do?
x = x++;

To answer some comments, it is undefined behaviour according to the standard. Seeing this, the compiler is allowed to do anything up to and including format your hard drive. See for example this comment here. The point is not that you can see there is a possible reasonable expectation of some behaviour. Because of the C++ standard and the way the sequence points are defined, this line of code is actually undefined behaviour.

For example, if we had x = 1 before the line above, then what would the valid result be afterwards? Someone commented that it should be

x is incremented by 1

so we should see x == 2 afterwards. However this is not actually true, you will find some compilers that have x == 1 afterwards, or maybe even x == 3. You would have to look closely at the generated assembly to see why this might be, but the differences are due to the underlying problem. Essentially, I think this is because the compiler is allowed to evaluate the two assignments statements in any order it likes, so it could do the x++ first, or the x = first.

1800 INFORMATION
X is incremented by 1. You assigned x to itself and then incremmented it. it is equivalent to x++;
Charles Graham
this does x = x; x += 1;so yeah, like charles graham says. I wouldn't call this unspecified
Orion Edwards
Modifying a variable more than once between two sequence points is explicitly stated as undefined behaviour in both standard C and C++.
KTC
I'm cracking up laughing right now at the thought of someone writing a C compiler that formats your hard drive upon seeing x = x++ because it's undefined in the standard :-)
dancavallaro
+1, especially for the "formatting hard drive part". Actually, for people who code like this, formatting the hard drive might save future generations of maintenance programmers a lot of grief...
sleske
A: 

Using the macro versions of functions like "max" or "isupper". The macros evaluate their arguments twice, so you will get unexpected side effects when you call max(++i, j), or isupper(*p++)

The above is for standard C. In C++ these problems have largely disappeared. The max function is now a templated function.

Mike Thompson
By the way, that's not unspecified behaviour, it's just not the behaviour people expect on first sight. It's totally consistent between all platforms and compilers, though.
Steve Jessop
onebyone, whether it's UB or not depends on the implementation of those. if it's a > b ? a : b; then it's not. but if it happens to use a or b between two sequence points more than once, then it is UB (with a or b being ++i)
Johannes Schaub - litb
A: 

Segfault, GPF, Bus Error

:)

Arkadiy
A: 
switch (value)
{
   case 1:
    // Do some stuff, forgot the break!
   case 2:
    // Do other stuff
    break;
}

Hey why does this switch statement fail with a value of 1 !?!?

Doug T.
That's not unspecified behaviour, it's just not what the programmer intended to say. Falling through from one case to another has a defined effect in the spec, and sometimes is done deliberately.
Steve Jessop
Yeah C# got rid of this because it bit too many poeple in the btt.
Charles Graham
No Duff's device? I knew there was some reason I've never learned C#.
Steve Jessop
A: 

forgetting to add static float foo(); in the header file, only to get floating point exceptions being thrown when it would return 0.0f;

Nicholas Mancuso
A: 

unsigned char *ucptr; unsigned int data[123]; ... ucptr=&data[i]; //then try to use *ucptr or ucptr[3], etc.

If you dont use a union or a function it doesnt always work the way you want it to.

dwelch
Actually, even with a union it is unspecified behaviour, isn't it?
sleske
With a union the compiler isnt required to read your mind to figure out what you intended it to do. For the human it reinforces that you understand the boundary you are crossing and you are being intentionally careful about crossing that boundary. Reducing error and improving reliability and productivity.
dwelch
+3  A: 

Dividing something by a pointer to something. Just won't compile for some reason... :-)

result = x/*y;
Adam Pierce
result = x/(*y) should work.
Charles Graham
This would be funnier if StackOverflow didn't cleverly do syntax highlighting :-)
Steve Jessop
Haha nice one, I am writing it down :-)
Drealmer
+7  A: 

My personal favourite undefined behaviour is that if a non-empty source file doesn't end in a newline, behaviour is undefined.

I suspect it's true though that no compiler I will ever see has treated a source file differently according to whether or not it is newline terminated, other than to emit a warning. So it's not really something that will surprise unaware programmers, other than that they might be surprised by the warning.

So for genuine portability issues (which mostly are implementation-dependent rather than unspecified or undefined, but I think that falls into the spirit of the question):

  • char is not necessarily (un)signed.
  • int can be any size from 16 bits.
  • floats are not necessarily IEEE-formatted or conformant.
  • integer types are not necessarily two's complement, and integer arithmetic overflow causes undefined behaviour (won't arise on modern hardware, but a fun point for pedantry).
  • "/", "." and ".." in a #include have no defined meaning and can be treated differently by different compilers (this does actually vary, and if it goes wrong it will ruin your day).

Really serious ones that can be surprising even on the platform you developed on, because behaviour is only partially undefined / unspecified:

  • POSIX threading and the ANSI memory model. Concurrent access to memory is not as well defined as novices think. volatile doesn't do what novices think. Order of memory accesses is not as well defined as novices think. Accesses can be moved across memory barriers in certain directions. Memory cache coherency is not required.

  • Profiling code is not as easy as you think. If your test loop has no effect, the compiler can remove part or all of it. inline has no defined effect.

And, as I think Nils mentioned in passing:

  • VIOLATING THE STRICT ALIASING RULE.
Steve Jessop
+2  A: 

A compiler doesn't have to tell you that you're calling a function with the wrong number of parameters/wrong parameter types if the function prototype isn't available.

mbac32768
Yes. Benevolent compilers however will usually help you with a warning...
sleske
+3  A: 

The EE's here just discovered that a>>-2 is a bit fraught.

I nodded and told them it was not natural.

Tim Williscroft
+1  A: 

Another issue I encountered (which is defined, but definitely unexpected).

char is evil.

  • signed or unsigned depending on what the compiler feels
  • not mandated as 8 bits
itj
Well, it's not evil if you use it for what it is meant for, i.e. for *characters*...
sleske