tags:

views:

230

answers:

7

Obviously, dereferencing an invalid pointer causes undefined behavior. But what about simply storing an invalid memory address in a pointer variable?

Consider the following code:

const char* str = "abcdef";
const char* begin = str;
if (begin - 1 < str) { /* ... do something ... */ }

The expression begin - 1 evaluates to an invalid memory address. Note that we don't actually dereference this address - we simply use it in pointer arithmetic to test if it is valid. Nonetheless, we still have to load an invalid memory address into a register.

So, is this undefined behavior? I never thought it was, since a lot of pointer arithmetic seems to rely on this sort of thing, and a pointer is really nothing but an integer anyway. But recently I heard that even the act of loading an invalid pointer into a register is undefined behavior, since certain architectures will automatically throw a bus error or something if you do that. Can anyone point me to the relevant part of the C or C++ standard which settles this either way?

+2  A: 

Any use of an invalid pointer yields undefined behaviour. I don't have the C Standard here at work, but see 'invalid pointers' in the Rationale: http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf

fizzer
If that's the case, couldn't you just cast all your pointers to a `ptrdiff_t` when doing pointer arithmetic? In other words, if I changed my above code sample to read `if ((ptrdiff_t)begin - 1)` would that no longer be undefined behavior?
Channel72
Not undefined behaviour, but the result is implementation defined. That is, your implementation will document some reasonable behaiour, but it will not be portable, and may not be useful.
fizzer
The comp.lang.c FAQ addresses this: http://c-faq.com/ptrs/int2ptr.html. Like I said, I don't have the Standard to hand.
fizzer
Note that ptrdiff_t will hold the *difference between* pointers, not pointers themselves. This is not the same thing.
fizzer
+6  A: 

I have the C Draft Standard here, and it makes it undefined by omission. It defines the case of ptr + I at 6.5.6/8 for

  • If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression.
  • Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object.

Your case does not fit any of these. Neither is your array large enough to have -1 adjust the pointer to point to a different array element, nor does any of the result or original pointer point one-past-end.

Johannes Schaub - litb
Is this undefined or unspecified behavior. I would expect the code to run and work and have no bad consequence though weather it entered the if branch would be unknowable (via the standard).
Martin York
@Martin York: C++ standard defines this to be an Undefined behavior even if it is not dereferenced. I hope I have picked up the relevant quote in my post
Chubsdad
It is behavior which could cause a hardware fault on hardware which validates the contents of pointer registers. As such, it is Undefined Behavior. It is possible and permissible for a particular implementation to specify what will happen if programs do various things that, per the standard, evoke Undefined Behavior. If an implementation conforms to its own spec, the behavior will then be well-defined. If the code is run on a different implementation which conforms to the C standard, but not to that particular implementation's specs, however, the program may fail in arbitrary ways.
supercat
+4  A: 

Your code is undefined behavior for a different reason:

the expression begin - 1 does not yield an invalid pointer. It is undefined behavior. You are not allowed to perform pointer arithmetics beyond the bounds of the array you're working on. So it is the subtraction itself that is invalid, and not the act of storing the resulting pointer.

jalf
The C99 Rationale (linked to in my answer) specifically mentions pointer arithmetic beyond the bound of the array as yielding invalid pointers.
fizzer
If the expression was modified to `(ptrdiff_t)begin - 1`, would that still yield undefined behavior? Since ptrdiff_t has to be a signed integral type, I would think this would be okay.
Channel72
A ptrdiff_t may only be calculated for two pointers into the same data object. The only exception to the "within the bounds of the array" is a pointer *one* beyond the *end* of the array.
DevSolar
@fizzer: I don't have the C++ standard here (formatted my computer a few days ago, and still need to grab that from my backups), but it states that this is undefined. I don't know if C does it differently, but I'd imagine that it's just that rationale deals with what *actually* happens (in reality, you just get an invalid pointer), but the standard is more strict and says "it's a nonsensical operation, it is undefined".
jalf
@Channel72: Yes, as long as the following are all true: (1) `sizeof(ptrdiff_t) >= sizeof(void*)` (this isn't necessarily guaranteed), (2) the result of casting `begin` to the signed integer type `ptrdiff_t` doesn't result in the minimum value representable by that type (if it does, then the subtraction will result in undefined behavior), and that (3) the implementation defines conversion of a pointer to an integer consistently so that you can compare the result of comparing the result of this expression with the result of `(ptrdiff_t)str` and get a meaningful result (also not guaranteed).
James McNellis
And (4), the result of the cast results in a value that is representable by a `ptrdiff_t` (the result of the cast might exceed the maximum value representable by a `ptrdiff_t`) [Those are for C, where there is an implicit conversion from pointer to integer; at least that's my understanding of it. I'd think the same is true for C++; the problem is that converting a pointer to an integer has implementation-defined results.]
James McNellis
A: 

The access violation will occur when you dereference the variable. Consider the following:

void IntRef(int &value)
{
    // At this point the code will not have thrown an exception. 
    // The minute you attempt to use "value" it will crash.
    value = 42; // Crash
}

int main()
{
    int *const pNull = NULL;
    IntRef(*pNull);
    return 0;
}
Mark Ingram
You got undefined behavior when you did `*pNull`.
GMan
+2  A: 

$5.7/6 - "Unless both pointers point to elements of the same array object, or one past the last element of the array object, the behavior is undefined.75)"

Summary, it is undefined even if you do not dereference the pointer.

Chubsdad
That text concerns subtraction of a pointer from a pointer; the OP is subtracting an integer from a pointer.
James McNellis
@James McNellis: That's about pointer arithmetic I guess. Ultimately it's about the resultant pointer value
Chubsdad
@Chubsdad: I am unsure about your reasoning, subtracting two pointers from different arrays you might in fact have issues because the pointers point to different memory zones (think far / near memory in 16bits architecture). There is nothing here about meddling with the pointers themselves, in fact it is quite common to use the upper bits of 64-bits pointers to store additional flags.
Matthieu M.
A: 

Yes, it's undefined behavior. See the accepted answer to this closely related question. Assigning an invalid pointer to a variable, comparing an invalid pointer, casting an invalid pointer triggers undefined behavior.

sharptooth
+2  A: 

Some architectures have dedicated registers for holding pointers. Putting the value of an unmapped address into such a register is allowed to crash. Integer overflow/underflow is allowed to crash. Because C aims to work on a broad variety of platforms, pointers provide a mechanism for safely programming unsafe circuits.

If you know you won't be running on exotic hardware with such finicky characteristics, you don't need to worry about what is undefined by the language. It is well-defined by the platform.

Of course, the example is poor style and there isn't a good reason to do it.

Potatoswatter