ansaurus

Question

Best/worst examples of undefined behavior in C or C++?

Answer 1

+1 A:

Identical char literals might point to the same memory and might be editable, if an edit changes one, both or isn't allowed is undefined

Martin Beckett 2009-12-16 03:58:14

You have to go to some trouble to cast those to a non-const array, don't you? Has that ever happened to you?

Mark Ransom 2009-12-16 04:01:15

They're not const-qualified in C (for compatibility with C programs written before `const` came along).

caf 2009-12-16 04:21:22

In C++ string literals are arrays of `const char` but a string literal can be implicitly converted to a `char*` for compatibility with C.

Charles Bailey 2009-12-16 07:26:53

@Mark, no trouble: s=strchr("abcde",'c'); *s='1';

Secure 2009-12-16 08:59:51

Answer 2

+2 A:

This is type of thing is reasonably common:

void free_and_null(void **p)
{
    free(*p);
    *p = NULL;
}

int *x = malloc(100);
free_and_null((void **)&x);

...and it can, at least in theory, break these days on compilers that optimise assuming strict interpretation of aliasing rules.

The absolute most common one is probably the old trick of retrieving the value of a union member that wasn't the one most recently stored to.

caf 2009-12-16 04:28:23

Case one for the macro solution!

Chris Lutz 2009-12-16 05:25:47

I do not see the problem here. Could you elaborate?

Thomas 2009-12-16 09:44:15

Assumes that sizeof(int*)==sizeof(void*). If sizeof(void*)>sizeof(int*) you have bigger issues.

MSalters 2009-12-16 09:45:49

It also accesses an object of type `int *` through an lvalue of type `void *`. That's simply not guaranteed to work.

caf 2009-12-16 10:15:38

Answer 3

A:

Alok 2009-12-16 04:49:27

The problem isn't EOF causing an overrun - when `fgetc()` is returning EOF it doesn't go through the intermediate `unsigned int` conversion. The problem is if you have a char being read from the stream which when converted to an `unsigned char` has a value that can't be represented as an `int` - that unsigned character will be converted to an `int` when returned, resulting in an implementation-defined (not undefined) value, which I supposed could happen to match up with the EOF value.

Michael Burr 2009-12-16 07:36:00

Hmm, makes much more sense than my understanding. Thanks!

Alok 2009-12-16 08:06:42

Answer 4

A:

How many times have I seen code like this:

void write(int fileID,int data)
{
    write(fileID,&data,4); // If they are on the ball 4 is sizeof(int)
}

int read(int fileID)
{
    int result;
    read(fileID,&result,4);
}

But no consideration to endianess.

Martin York 2009-12-16 05:18:39

Would writing `sizeof(int)` instead of `4` remedey that?

knight666 2009-12-16 05:28:04

No. using sizeof(int) solves the problem that int is not always 4. You need to use htonl() and family to get a specific endianes.

Martin York 2009-12-16 05:52:50

@Martin: the interface you provided doesn't imply such a conversion would occur, so in general for such an interface I would expect the user to have already converted the value to network byte order, or else they don't care. In short all that statement says is write/read the amount of an int from/into this variable to/from this fd, which could be a socket, a pipe etc... a solution could be to just give it a better name.

Beh Tou Cheh 2009-12-16 08:29:32

@Beh: The problem exactly: They don't care, because they don't realize it matters. Assuming a file means, requiring compatible hardware upgrades. Assuming a stream, requires a homogeneous set of computers. You are correct that the interface I provided does not imply this conversion but I was trying to make an small example, you can assume the context is a transport layer. The transport layer should deal with all the conversion require to get the data to the destination. Otherwise the coupling of business code to transport layer is high.

Martin York 2009-12-16 15:27:37

@Beh: My main point being the incompatibility of hardware data representations. Writing code that depends on the underlying hardware representation of the data is going to work in the small scale (testing/Dev). But will fail when applied to real world situations.

Martin York 2009-12-16 15:30:48

@Martin York: However, `sizeof(int)` introduces another possibility, that `sizeof(int)` will be different on the two platforms. (If the endianness can change, so can other facets of data representation.) Much better to specify a definite length and account for endianness, and handle issues where `sizeof(int) != 4` in the appropriate program.

David Thornley 2010-01-15 15:21:39

Answer 5

A:

I think that as a paradigm, thinking too much in terms of undefined and defined is a bad idea.

I see a lot of braindead C++ lawyering on here, and I think it causes a lot of the really bad thinking which causes a lot of the really bad programming that seems to be the norm.

Of course I'm not suggesting to do stuff that has no guaranteed result, but usually the train to undefined land has a first stop when you try to use all of the crazy features that rules lawyering will lead you to in the first place. Like using const improperly (almost all uses of it), complicated initializers in the constructor, exceptions (due to the implementation), etc.

You need to understand more the why than the what of everything or you are just doomed. Especially when the rules of C++ change all the time, generally making them much more complicated and buggy. I see all the time someone posts some mindbogglingly stupid answer because all they seem to know is the C++ rules, not how compilers have to work or how languages generally handle things or why things are the way they are.

Charles Eli Cheese 2009-12-16 09:41:07

Answer 6

+2 A:

There is the static initialization order problem. When specifying two static variables in different .cpp's, and one is initialized using the other, the initialization order is undefined. In my opinion, this renders the static data almost useless.

The workaround is to wrap the static value in a function, as follows:

Value value()
{
    static Value ret(otherStaticInstance);
    return ret;    
}

Dimitri C. 2009-12-16 09:44:59

Answer 7

+1 A:

When defining two functions with the same name in the .cpp only (in separate .cpp's), the compiler won't complain, but pick one of both implementations randomly. The workaround is to wrap the function in an unnamed namespace.

Dimitri C. 2009-12-16 09:51:10

Do you mean the same .cpp file? Why would you do that?

configurator 2010-10-17 23:01:14

@configurator: I mean in different .cpp's.

Dimitri C. 2010-10-18 06:14:02

Wow, I had no idea that could happen. Sounds dangerous!

configurator 2010-10-18 06:30:58

Answer 8

A:

Overflow on signed integers is undefined.

But the following code, assuming both a and b are positive, just works everywhere :)

int a, b;
    /* maximize some value */
    if (a + b < a) { /* oops */ }
    else { maxval = a + b; }

The proper way is to test for overflow before causing the Undefined Behaviour

int a, b;
    /* maximize some value */
    if (a > INT_MAX - b) { /* oops */ }
    else { maxval = a + b; }

pmg 2009-12-16 09:54:36

*Everywhere*? It won't even always work with gcc! Check out this documentation on `-fstrict-overflow`: http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fstrict_002doverflow-789 - according to that, gcc can assume that the test in your `if` statement will never be true, and entirely optimise out that branch.

caf 2009-12-16 10:21:09

There have been platforms that would throw some sort of exception for arithmetic overflow, although I'm not aware of any that are currently in widespread use. The reasoning (see the original C89 Rationale) was that mandating either flagging some sort of exception or wrapping around would force signed arithmetic to be inefficient on some platforms. The Committee was willing to require unsigned arithmetic to follow rules, as they thought maximum efficiency was less important there.

David Thornley 2010-01-15 15:29:10

ansaurus

tags:

views:

answers:

Best/worst examples of undefined behavior in C or C++?

related questions