views:

277

answers:

4

As I understand from my reading, undefined-behavior is the result of leaving the compiler with several non-identical alternatives at compile time. However, wouldn't that mean that if one were to follow strict coding practice (like putting each assignment and each equality in a separate statement, proper debugging and commenting) then it shouldn't pose a significant problem in finding the source of the undefined-behavior.

Further, there are, for each error that comes up, if you identify the code, you should know what statements can be used in that particular statement's stead, correct?

EDIT: I'm not interested in places where you have written code that you didn't mean to write. I'm interested in examples where code that is sound by mathematical logic fails to work.

Also, I consider 'good coding practice' to be strong informative comments every few lines, proper indentation, and debugging dumps on a regular basis.

+1  A: 

I'm not sure if there's a formal definition of "undefined behavior", but following good coding standards can reduce ambiguity and lead to fewer compile and runtime defects.

However, getting two programmers to agree on what "good coding standards" is a complicated and error-prone process.

To your second question, yes compilers will generally output an error code that you can use to fix the problem

Sam Post
"formal definition of 'undefined behavior'" made me laugh a little...
The standard does define *undefined behavior* and will usually explicitly say "not doing X" or "doing Y" is UB. Understanding those situations and recognizing when they crop up in your code is a another matter. :)
Roger Pate
_1.3.12 undefined behaviour_ is a part of the language standard. I think that that is as formal a definition as you are likely to expect.
Charles Bailey
+10  A: 

Undefined behavior isn't necessarily leaving the compiler with multiple alternatives. Most commonly it is simply doing something that doesn't make sense.

For example, take this code:

int arr[2];
arr[200] = 42;

this is undefined behavior. It's not that the compiler was given multiple alternatives to choose from. it's just that what I'm doing does not make sense. Ideally, it should not be allowed in the first place, but without potentially expensive runtime checking, we can't guarantee that something like this won't occur in our code. So in C++, the rule is simply that the language specifies only the behavior of a program that sticks to the rules. If it does something erroneous like in the above example, it is simply undefined what should happen.

Now, imagine how you're going to detect this error. How is it going to surface? It might never seem to cause any problems. Perhaps we just so happen to write into memory that's mapped to the process (so we don't get an access violation), but is never otherwise used (so no other part of the program will read our garbage value, or overwrite what we wrote). Then it'll seem like the program is bug-free and works just fine.

Or it might hit an address that's not even mapped to our process. Then the program will crash immediately.

Or it might hit an address that's mapped to our process, but at some point later will be used for something. Then all we know is that sooner or later, the function reading from that address will get an unexpected value, and it'll behave weird. That part is easy to spot in the debugger, but it doesn't tell us anything about when or from where that garbage value was written. So there's no simple way to trace the error back to its source.

jalf
wait... I thought undefined-behavior was distinct from an a programming error... I mean, something like that is not what I would consider an ambiguity in rules. I would define it as my own stupidity. Aside from that, I saw an example where a triple assignment statement gave different values on different machines. I'm more curious about this kind of undefined-behavior.
piggles
They're different concepts. "Undefined behavior" is simply "anything that the C++ language standard did not specify". A classic example is "what should happen if I write outside the bounds of an array". It is generally a programmer error to *rely* on undefined behavior. Undefined behavior in itself isn't right or wrong. But if it occurs in your program, you have a problem.
jalf
As for your example of multiple assignments, the same applies. The problem is that the standard does not specify how it should be evaluated. That obviously means that it is an error to rely on any specific behavior from that code. Undefined behavior is not due to ambiguity in rules. It is simply the absence of rules. The rules specify what should happen if you do *one* assignment in a statement. If you do two, they simply have nothing to say. It's not a conflict between two possible interpretations of the rules. There simply *are* no rules that apply.
jalf
Undefined behavior and programming error are orthogonal concepts. I can *correctly* use constructs which result in "undefined behavior" from the language's perspective. For example, if I know specifics of my implementation, I can directly access and read arbitrary memory locations by casting an int to a pointer (which is "undefined behavior").
+1. The comments have answered my question! repost as answer so I can accept please.
piggles
@jalf: if it occurs in your program and you know it's there, you don't necessarily have a problem. You even said it: "Undefined behavior in itself isn't right or wrong."
STingRaySC: Actually, int to pointer is *not* UB, but it does depend on implementation, see 5.2.10/5.
Roger Pate
@STingRaySC: You're right, if you really know your platform, then you might in rare case rely on UB without it being an error. But more commonly, you're thinking of un*specified* behavior, or implementation-defined behavior. For example, the result of casting an int to a pointer is implementation-defined, not undefined. Undefined means "there are no rules". When rules exist, but they're defined by the platform rather than the language, it's generally implementation-defined, and so a lot less evil
jalf
@Roger Pate: the point is that I don't consider the actual answer part of the answer, only the two comments. I suppose I could accept this anyway, though :-p
piggles
There is a distinction between "undefined" and "unspecified?" I understand the difference between implementation-defined and undefined. I believe that given some set of constraints, you can write code that relies on UB and is logically valid. I can't come up with a concrete example at the moment. I think of it as the *scope* of definition. Defined is guaranteed by the language itself. Implementation-defined is guaranteed by the particular platform/compiler/environment, and is thus "undefined" to the language itself. Undefined is guaranteed by my own fabricated constraints and not impl.
@Mechko: I'm starting to be unsure that what you are asking about is really what is considered UB by the language standard.
@STingRaySC You're right, methinks. Given the response I got as opposed to the response that I expected, I'm starting to suspect that my conception of UB is formed too much from the horror stories that people told me and too little from the language standard.
piggles
STingRaySC: I went ahead and included those definitions in my answer too, as that's probably the easiest.
Roger Pate
@STingRaySC: Yes, the C++ standard is very specific about this. "Undefined" is what I've said above. *No* rules apply. An example is writing to arbitrary memory addresses. No matter your platform, that is bad. Implementation-defined is legal, but the rules are set by the platform/compiler, and they must document these rules. And "unspecified" is similar, except that they *don't* have to document this.
jalf
For example, the order of evaluation of function parameters is unspecified. That is, it must happen, and it must happen in a sane way and without doing anything unexpected, but the order in which they're evaluated is left up to the compiler, and it is not required to tell you which strategy it chooses.
jalf
When you're writing portable code, of course, implementation-defined behaviour often might as well be undefined, because you can't rely on any implementation notes when you're writing the code. For instance it's implementation-defined whether `(int*)1 + 1` has defined behaviour or not, since it's implementation-defined whether `(int*)1` is a trap representation. So if you're writing for a specific implementation, then it's either defined or undefined. If you're writing portable code, it's "may be undefined", so probably you avoid it anyway.
Steve Jessop
@jalf: Writing to arbitrary memory locations is indeed *undefined* by the language standard and the implementation, *but* if I know what is there, it's not undefined by me, the programmer.
It is not enough to know what is located at a specific address. You *also* have to know that the compiler's undefined behavior (dereference pointer to arbitrary memory) is consistent enough to be relied on. The compiler *could* insert all sorts of side effects. That is what "undefined" means.Anyway, when discussing C++, I think we should stick to the official definitions. "Undefined behavior" is behavior undefined by the language. What you choose to define or not define is your own business, and is just adding confusion.
jalf
+1  A: 

As I understand from my reading, undefined-behavior is the result of leaving the compiler with several non-identical alternatives at compile time.

While that may be one source of undefined behavior, you're speaking too abstractly. You need a specific example of what you mean by "non-identical alternatives at compile time."

If by "follow strict coding practice," you mean don't use logic that results in undefined behavior, then yes (because there would be no undefined behavior). Tracking down a bug because of undefined behavior may or may not be easier than tracking one caused by a logic error.

Note that code which results in "undefined behavior" is still legal C++ code. I consider it a class of code/logic that should only very seldomly be used, when that "undefined behavior" is predictable for a given program on a given platform with a given implementation. You will find cases that what the language considers "undefined behavior" will in fact be defined for a particular environment/set of constraints.

It's "legal C++ code"; however, if UB occurs at any time in the program, the standard makes no guarantees on the behavior of that program, *even if the UB occurs after the behavior in question.*
Roger Pate
I wholly agree.
The particular part of the question you quoted would be describing *unspecified behavior*, which is different from UB.
Roger Pate
"legal C++ code" is tricky. The standard doesn't use the word "legal". I'd personally tend to say "legal" is when you have a program that stays within the boundaries of what is defined by the standard (including implementation-defined ones). But you could also say a C++ program is "legal" as long as it compiles. The syntax is accepted by a C++ compiler, after all. Depending on what you mean by "legal", a program that relies on UB may or may not be "legal".
jalf
There are a *few* cases where the compiler or platform actually offers guarantees about what is otherwise UB, but they are *really* rare. Usually, what you're thinking of would be unspecified behavior, not undefined.
jalf
By "legal," I mean, *it compiles*.
"Legal" usually means almost the same as "well-formed program" (which is defined by the standard); the big difference is some violations of the ODR don't require diagnostics, but jalf is right that it's imprecise.
Roger Pate
By legal you mean something that is grammatically correct, such as "Rain is dry". But this statement makes no sense, and is hence not guaranteed to be understood. In one situation may be simply laughed at, in another you might be institutionalized. Good analogy?
piggles
No the analogy doesn't work for me. I should know better than to use terms like "legal." This point is being belabored now for no benefit...
+7  A: 
Roger Pate
Okay, I give up: why does your `std::string::data()` example invoke UB? The pointer returned by `std::string::data()` is valid until the next non-const operation on the string, but by the time you modify it, `cout.write()` is already done with it. (I'm probably misunderstanding something; my copy of the C++ 98 standard says that `std::basic_string::operator[](pos)` is equivalent to `std::basic_string::data()[pos]`, but `data()`'s description says that programs aren't allowed to mutate the returned character array, which only raises the question why `operator[]` can return a non-const ref.)
jamesdlin
21.3/5: "References .. referring to elements .. may be invalidated by .. calling data() .."
Roger Pate
And yes, that definition of op[] is problematic and there's a defect report about it, but not because of the constness (that's always been a rather trivial issue for me, it's just saying `s.data()[n]` and `s[n]` are the same object in those cases).
Roger Pate