views:

100

answers:

3

I understand the reasons for compiler/interpreter language extensions but why is behaviour that has no valid definition allowed to fail silently/do weird things rather then throwing a compiler error? Is it because of the extra difficulty(impossible or simply time consuming) for the compiler to catch them)?

P.S. what languages have undefined behaviour and which don't?

P.P.S. Are there instances of undefined behaviour which is not impossible/takes too long to catch in compilation and if so are there any good reasons/excuses for those.

+4  A: 

The concept of undefined behaviour is required in languages like C and C++, because detecting the conditions that cause it would be impossible or prohibitively expensive. Take for example this code:

int * p = new int(0);
// lots of conditional code, somewhere in which we do
int * q = p;
// lots more conditional code, somewhere in which we do
delete p;
// even more conditional code, somewhere in which we do
delete q;

Here the pointer has been deleted twice, resulting in undefind behaviour. Detecting the error is too hard to do for a language like C or C++.

anon
Also note that allowing undefined behaviour has become somewhat less common with more modern languages. This makes implementing a compiler/runtime a bit harder, but is a **huge** advantage for developers that *use* the language.
Joachim Sauer
@Joachin Actually, in my code I would say I have a problem with UB maybe once every 6 months, if that. So getting rid of it would not be a huge advantage for me.
anon
@Neil: I'm just saying that some languages such as Java don't have any "undefined behavior" and that fact makes it **much** easier to reason about the code, among other things.
Joachim Sauer
@Joachim: it makes it easier to reason about incorrect code (in Java you can be somewhat more confident that a bug isn't a security flaw, and that even if it is a security flaw, that's partly someone else's fault, not just yours). Once the application code does what it's designed to do, then it's about equally easy to reason about in either. Just because you *can* use a null reference in Java (throwing a NPE) or deref a null pointer in C (undefined), doesn't mean you're in a good place in either language if your reasoning about your app involves doing so...
Steve Jessop
A: 

Because different operating systems operate differently (...), and you can't just say "crash in this case", because it could be something the operating system could do better.

LukeN
A: 

Largely because, to accomplish certain purposes, it's necessary. Just for example, C and C++ were originally used to write operating systems, including things like device drivers. To do that, they used (among other things) direct access to specific hardware locations that represented I/O devices. Preventing access to those locations would have prevented C from being used for its intended purpose (and C++ was specifically targeted at allowing all the same capabilities as C).

Another factor is a really basic decision between specifying a language and specifying a platform. To use the same examples, C and C++ are both based on a conscious decision to restrict the definition to the language, and leave the platform surrounding that language separate. Quite a few of the alternatives, with Java and .NET as a couple of the most obvious examples, specify entire platforms instead.

Both of these reflect basic differences in attitude about the design. One of the basic precepts of the design of C (largely retained in C++) was "trust the programmer". Though it was never stated quite so directly, the basic "sandbox" concept of Java was/is based on the idea that you should not trust the programmer.

As far as what languages do/don't have undefined behavior, that's the dirty little secret: for all practical purposes, all of them have undefined behavior. Some languages (again, C and C++ are prime examples) go to considerable effort to point out what behavior is undefined, while many others either try to claim it doesn't exist (e.g., Java) or mostly ignore many of the "dark corners" where it arises (e.g., Pascal, most .NET).

The ones that claim it doesn't exist generally produce the biggest problems. Java, for example, includes quite a few rules that try to guarantee consistent floating point results. In the process, they make it impossible to execute Java efficiently on quite a bit of hardware -- but floating point results still aren't really guaranteed to be consistent. Worse, the floating point model they mandate isn't exactly perfect so under some circumstances it prevents getting the best results you could (or at least makes you do a lot of extra work to get around what it mandates).

To their credit, Sun/Oracle has (finally) started to notice the problem, and is now working on a considerably different floating point model that should be an improvement. I'm not sure if this has been incorporated in Java yet, but I suspect that when/if it is, there will be a fairly substantial "rift" between code for the old model and code for the new model.

Jerry Coffin