tags:

views:

817

answers:

13

I have seen it asserted several times now that the following code is not allowed by the C++ Standard:

int array[5];
int *array_begin = &array[0];
int *array_end = &array[5];

Is &array[5] legal C++ code in this context?

I would like an answer with a reference to the Standard if possible.

It would also be interesting to know if it meets the C standard. And if it isn't standard C++, why was the decision made to treat it differently from array + 5 or &array[4] + 1?

+2  A: 

Even if it is legal, why depart from convention? array + 5 is shorter anyway, and in my opinion, more readable.

Edit: If you want it to by symmetric you can write

int* array_begin = array; 
int* array_end = array + 5;
rlbond
I think that the style I use in the question looks more symmetrical: the array declaration and the begin/end pointers, or sometimes I pass those directly to an STL function. That is why I use it instead of the shorter version.
Zan Lynx
@rlbond: To be symmetrical I think it'd need to be array_begin = array + 0; array_end = array + 5; How's that for a long delayed comment response?
Zan Lynx
It might be a world record :)
rlbond
+12  A: 

It is legal.

According to the gcc documentation for C++, &array[5] is legal. In both C++ and in C you may safely address the element one past the end of an array - you will get a valid pointer. So &array[5] as an expression is legal.

However, it is still undefined behavior to attempt to dereference pointers to unallocated memory, even if the pointer points to a valid address. So attempting to dereference the pointer generated by that expression is still undefined behavior (i.e. illegal) even though the pointer itself is valid.

In practice, I imagine it would usually not cause a crash, though.

Edit: By the way, this is generally how the end() iterator for STL containers is implemented (as a pointer to one-past-the-end), so that's a pretty good testament to the practice being legal.

Edit: Oh, now I see you're not really asking if holding a pointer to that address is legal, but if that exact way of obtaining the pointer is legal. I'll defer to the other answerers on that.

Tyler McHenry
Tyler McHenry
Evan Teran
@Evan: There's more to this. Check out the last line of core issue 232: http://std.dkuug.dk/JTC1/SC22/WG21/docs/cwg_active.html#232. The last example there just looks wrong - but they clearly explain that the distinction is on the "lvalue-to-rvalue" conversion, which in this case doesn't take place.
Richard Corden
@Richard: interesting, seems there is some debate on the subject. I'd even agree that it **should** be allowed :-P.
Evan Teran
@Evan Teran: No it does not de-reference the member unless you try and read/write to the area. Think of it as a reference to the member it will not be de-referenced unless you try and obtain the value or change the value. Taking the address does not cause a read or write and thus does not de-reference the value.
Martin York
It's is the same kind of undefined behavior as is the "reference-to-NULL" thing people kept discussing about and where seemingly all voted up the answer saying "it is undefined behavior"
Johannes Schaub - litb
@Richard, note also that they agree so far that the difference should be an lvalue to rvalue conversion. But they find that this is not well reflected in the Standard. The same issue report can be found here which has the other points they noted included (including the concept of an "empty lvalue"): http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232
Johannes Schaub - litb
Richard Corden
A: 

Working draft (n2798):

"The result of the unary & operator is a pointer to its operand. The operand shall be an lvalue or a qualified-id. In the first case, if the type of the expression is “T,” the type of the result is “pointer to T.”" (p. 103)

array[5] is not a qualified-id as best I can tell (the list is on p. 87); the closest would seem to be identifier, but while array is an identifier array[5] is not. It is not an lvalue because "An lvalue refers to an object or function. " (p. 76). array[5] is obviously not a function, and is not guaranteed to refer to a valid object (because array + 5 is after the last allocated array element).

Obviously, it may work in certain cases, but it's not valid C++ or safe.

Note: It is legal to add to get one past the array (p. 113):

"if the expression P [a pointer] points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow"

But it is not legal to do so using &.

Matthew Flaschen
Care to explain the down-vote?
Matthew Flaschen
I upvoted you, because you are correct. There is no object guaranteed to be located at the past-the-end location. The person that downvoted you probably misunderstood you (you sound like you say any array-index-op refers to no object at all). I think here is an interesting thing: It *is* an lvalue, but it also does *not* refer to an object. And so here is a contradiction to what the standard says. And so, this yields undefined behavior :) This is also related to this one: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232
Johannes Schaub - litb
@litb: According to 3.9.2:3, there is "an unrelated object of the array's element type" at the past-the-end location. Doesn't that mean that the result of array[5] *is* an lvalue?
jalf
@jalf, the note says "that might be located at that address". It's not guaranteed that there is one located :)
Johannes Schaub - litb
Matthew Flaschen
Charles Bailey
"However, I still think it is not an lvalue. Because there is no object guaranteed to be at array[5], array[5] can not legally /refer/ to an object." <- That is exactly why i think it is undefined behavior: It relies on some behavior not explicitly specified by the standard, and thus falls within 1.3.12[defns.undefined]
Johannes Schaub - litb
litb, fair enough. Let's say it's /not definitely/ an lvalue, and thus /definitely not/ 100% safe.
Matthew Flaschen
+11  A: 

Your example is legal, but only because you're not actually using an out of bounds pointer. Let's deal with out of bounds pointers first: (because that's how I originally interpreted your question, before I noticed that the example uses a one-past-the-end pointer instead ;))

In general, you're not even allowed to create an out-of-bounds pointer. A pointer must point to an element within the array, or one past the end. Nowhere else.

The pointer is not even allowed to exist, which means you're obviously not allowed to dereference it either. ;)

Here's what the standard has to say on the subject:

5.7:5:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

(emphasis mine)

Of course, this is for operator+. So just to be sure, here's what the standard says about array subscripting:

5.2.1:1:

The expression E1[E2] is identical (by definition) to *((E1)+(E2))

Of course, there's an obvious caveat: Your example doesn't actually show an out-of-bounds pointer. it uses a "one past the end" pointer, which is different. The pointer is allowed to exist (as the above says), but the standard, as far as I can see, says nothing about dereferencing it. The closest I can find is 3.9.2:3:

[Note: for instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address. —end note ]

Which seems to me to imply that yes, you can legally dereference it, but the result of reading or writing to the location is unspecified.

Thanks to ilproxyil for correcting the last bit here, answering the last part of your question:

  • array + 5 doesn't actually dereference anything, it simply creates a pointer to one past the end of array.
  • &array[4] + 1 dereferences array+4 (which is perfectly safe), takes the address of that lvalue, and adds one to that address, which results in a one-past-the-end pointer (but that pointer never gets dereferenced.
  • &array[5] dereferences array+5 (which as far as I can see is legal, and results in "an unrelated object of the array’s element type", as the above said), and then takes the address of that element, which also seems legal enough.

So they don't do quite the same thing, although in this case, the end result is the same.

jalf
this pointer he is trying to create is one past the end...
Evan Teran
Matthew Flaschen
Evan Teran
@Evan: Yeah, I realized that too, and edited my post. Note that the question title asks about out-of-bounds though. The answer should describe both cases now.
jalf
Oh, I should probably mention that this is based on draft n1905 (from 2005). I don't have access to the "real" standard here, and this was the first one Google turned up.
jalf
@ilproxyil: You're right. Fixed it. Hopefully, that's all. SO is starting to throw CAPTCHA's at me now for repeatedly editing this post... ;)
jalf
@Martin: Don't you start, I'm tired of editing this thing. ;)Anyway, the standard says that array[5] is equivalent by definition to *(array + 5), so surely it *does* dereference the address. Or am I missing something (again)? ;)
jalf
@ilproxy: array[5] does not de-reference the address. You can consider it as an expression that is a 'reference to value'. It only de-references the address if it is used to retrive the value or write to the value. Here we are taking the address. This is explicitly allowed by the standard
Martin York
@jalf: Yes you are missing somthing. It is a reference to an rvalue.
Martin York
Which is? Where do you get the rvalue from?
jalf
David Thornley
@Martin, array[5] surely dereferences (look at 3.8/5 and 8.3.2/4, for example) It just does not read the stored value located there.
Johannes Schaub - litb
Johannes Schaub - litb
I'm accepting this answer even though I also like several others. This one references the standards. I'd also accept Adam Rosenfield's if I could.
Zan Lynx
Adam Rosenfield
@jalf, also the whole text that note is in starts with "If an object of type T is located at an address A..." <- That says "The following text assumes there is an object at address A." So your quote doesn't (and can't, under this condition) say that there is always an object at address A.
Johannes Schaub - litb
True, on both points. I guess I should have read the full text before that note. ;)But yeah, I'm not sure either. Even if dereferencing it is well-defined, then obviously the state of the object you access is not. So you might be able to take the address of it, but nothing else really.
jalf
Section 5.3.1.1 Unary operator '*': 'the result is an lvalue referring to the object or function'. Section 5.2.1 Subscripting The expression E1[E2] is identical (by definition) to *((E1)+(E2)). By my reading of the standard here. There is no de-refrencing of the resulting pointer.
Martin York
@litb: so are we saying that for a T* which points at one past the end of an array, that there is an object pointed to by the pointer - even if it's only a byte (which is an object) and not actually a T - and therefore unary * is well defined, returning an lvalue of type T but which may not actually be a complete T? Sounds like a plausible interpretation.
Charles Bailey
@Charles, yes that's what i think is going on. It would be all fine, as long as you don't try to read a value (lvalue->rvalue). If you would try, you would fall into 3.10/15 and 4.1/1. Thus, this would be well defined always: unsigned char c[1]; unsigned char c1 = c[1]; But this not always, because you don't know what might be located there besides that byte: float s[1]; float s1 = s[1]; But contrary, this is always fine, i think: s[1]; (no read happening).
Johannes Schaub - litb
A: 

If your example is NOT a general case but a specific one, then it is allowed. You can legally, AFAIK, move one past the allocated block of memory. It does not work for a generic case though i.e where you are trying to access elements farther by 1 from the end of an array.

Just searched C-Faq : link text

Aditya Sehgal
the top answer says "its legal" and I also say the same thing. Why the down vote then :). Is something wrong with my answer?
Aditya Sehgal
+2  A: 

I don't believe that it is illegal, but I do believe that the behaviour of &array[5] is undefined.

  • 5.2.1 [expr.sub] E1[E2] is identical (by definition) to *((E1)+(E2))

  • 5.3.1 [expr.unary.op] unary * operator ... the result is an lvalue referring to the object or function to which the expression points.

At this point you have undefined behaviour because the expression ((E1)+(E2)) didn't actually point to an object and the standard does say what the result should be unless it does.

  • 1.3.12 [defns.undefined] Undefined behaviour may also be expected when this International Standard omits the description of any explicit definition of behaviour.

As noted elsewhere, array + 5 and &array[0] + 5 are valid and well defined ways of obtaining a pointer one beyond the end of array.

Charles Bailey
The key point is: "the result of '*' is an lvalue". From what I can tell, it only becomes UB iff you have an lvalue to rvalue conversion on that result.
Richard Corden
I would contend that as the result of '*' is only defined in terms of the object to which the expression to which the operator is applied, then it is undefined - by omission - what the result is if the expression didn't have a value which actually referred to an object. It's far from clear, though.
Charles Bailey
+3  A: 

I believe that this is legal, and it depends on the 'lvalue to rvalue' conversion taking place. The last line Core issue 232 has the following:

We agreed that the approach in the standard seems okay: p = 0; *p; is not inherently an error. An lvalue-to-rvalue conversion would give it undefined behavior

Although this is slightly different example, what it does show is that the '*' does not result in lvalue to rvalue conversion and so, given that the expression is the immediate operand of '&' which expects an lvalue then the behaviour is defined.

Richard Corden
+1 for the interesting link. I'm still not sure that I agree that p=0;*p; is well defined as I'm not convinced that '*' is well defined for an expression whose value is not a pointer to an actual object.
Charles Bailey
A statement that's an expression is legal, and means to evaluate that expression. *p is an expression that invokes undefined behavior, so anything the implementation does is according to the standard (including emailing your boss, or downloading baseball statistics).
David Thornley
+3  A: 

In addition to the above answers, I'll point out operator& can be overridden for classes. So even if it was valid for PODs, it probably isn't a good idea to do for an object you know isn't valid (much like overriding operator&() in the first place).

Todd Gardner
David Rodríguez - dribeas
+5  A: 

Yes, it's legal. From the C99 draft standard:

§6.5.2.1, paragraph 2:

A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).

§6.5.3.2, paragraph 3 (emphasis mine):

The unary & operator yields the address of its operand. If the operand has type ‘‘type’’, the result has type ‘‘pointer to type’’. If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. Otherwise, the result is a pointer to the object or function designated by its operand.

§6.5.6, paragraph 8:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Note that the standard explicitly allows pointers to point one element past the end of the array, provided that they are not dereferenced. By 6.5.2.1 and 6.5.3.2, the expression &array[5] is equivalent to &*(array + 5), which is equivalent to (array+5), which points one past the end of the array. This does not result in a dereference (by 6.5.3.2), so it is legal.

Adam Rosenfield
Interesting, so it's legal and explicitly well defined in C which *may* be different from C++ (see other discussions!).
Charles Bailey
He explicitly asked about C++. This is the kind of subtle difference that can not be relied when porting between the two.
Matthew Flaschen
He asked about both: "It would also be interesting to know if it meets the C standard."
Charles Bailey
Adam Rosenfield
The C standard is a normative reference in the C++ standard. That means that provisions in the C standard that are referenced by the C++ standard are part of the C++ standard. It does not mean that everything in the C standard applies. In particular Annex C is informative, not normative, so just because a difference isn't highlighted in this section doesn't mean that the C 'version' applies to C++.
Charles Bailey
Johannes Schaub - litb
@Charles Bailey: There's a difference between the C standard (which is probably C89, or C90?) and C99 which was standardised after the first C++ stndard (ie. C++ 98). IMHO, the C++ committee has tried to incorporate C99 fixed and additions where possible, but sometimes it just seems that C99 has solved problems in ways that make compatibility difficult at best. Either way, what you say does not apply to C99, only to the earlier standard.
Richard Corden
C89 was the C standard published by ANSI in 1989; C90 was the C standard published by ISO in 1990. They are essentially identical; I don't know if they are 100% identical. In any case, though, you're right -- the current C++ standard, C++03, refers to C90, not to C99. I don't know if the next C++ standard, C++0x, will refer to C90 or C99.
Adam Rosenfield
Richard Corden
Richard Corden
+1  A: 

C++ standard, 5.19, paragraph 4:

An address constant expression is a pointer to an lvalue....The pointer shall be created explicitly, using the unary & operator...or using an expression of array (4.2)...type. The subscripting operator []...can be used in the creation of an address constant expression, but the value of an object shall not be accessed by the use of these operators. If the subscripting operator is used, one of its operands shall be an integral constant expression.

Looks to me like &array[5] is legal C++, being an address constant expression.

David Thornley
Charles Bailey
I don't think it matters whether the array is static or stack-allocated.
David Thornley
It does if your referencing 5.19. The part that you elided with ... says "... designating an object of static storage duration, a string literal or a function. ...". This means that if your expression involves a stack allocated array you can't use 5.19 to reason about the validity of those expressions.
Charles Bailey
+5  A: 

Just to put it all together and so we can compare the different ideas that arose in the different answers. I'll comment on what i think about the stuff. Community wiki, because this is merely a collection of other people's thoughts :) All emphasis are put by me below.

First, we have to concern whether the pointer to one past the last element refers to an object. An array of bound N has N sub-objects that are its elements, as explained in 8.3.4/1

An object of array type contains a contiguously allocated non-empty set of N sub-objects of type T. - 8.3.4/1

To my knowledge, there is no mention in the Standard about an object located just after an array. If there is such an object, we are allowed to dereference the pointer that points one past the end, because of the following text and clarifying note

If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained. [Note: for instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address. ] - 3.9.2/3

This is meant to say that the following is well defined, if the implementation lays the objects in a way that the storage of b is allocated directly behind the array object (which you can get manually if you overallocate some chunk of memory using malloc, assigning to a pointer to an array having a smaller size - i will keep it simple and only illustrate using the following example)

int a[3], b;
*(a + 3) = 0;
assert(b == 0 && (a + 3 == &b) && a[3] == 0);

Consent on a few people is that your shown expression, &array[5], is undefined behavior. This is based on the fact, which stands, that the Standard says at 3.10/2 and 5.3.1/1

An lvalue refers to an object or function. - 3.10/2

The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. - 5.3.1/1

Above, we've seen that we are not guaranteed that there is an object (of the same type) after the last element of an array allocated. This should be kept different from another case, which happens when you have an object allocated (memory reserved), but that object has not started lifetime yet, as it happens if you allocate memory with malloc, and are going to placement-new an object into that area: Then you are allowed to dereference the area before you invoke the constructor, as long as you happen to keep some simple rules, like not trying to read a value out of the generated lvalue (3.8/5 and 3.8/6)

The interesting thing is, what happens when the lvalue does not refer to an object? Remember that an lvalue has to refer to an object (or function).

The Standard draws this operation well-defined at 5.2.8/2 talking about the typeid operator, which evaluates lvalue expression operands.

If the lvalue expression is obtained by applying the unary * operator to a pointer and the pointer is a null pointer value (4.10), the typeid expression throws the bad_typeid exception. - 5.3.1/1

This is contrary to 3.10/2, which requires that an lvalue expression refers to an object/function, which a null pointer value does not refer to. At this point, we have got a defect in the Standard: One place allows to de-reference a null pointer in a way that contradicts another part of the Standard. This was observed long ago, and is being discussed in the linked issue report. As the one guy there notes, it's just handling dereferenced null special, to circumvent the lvalue-without-object problem. Since it starts out with talking about an lvalue, it's at least a problematic way for handling that currently.

The idea to generally handle this, is to introduce an empty lvalue that purposely refers to no object or function. If we try to read a value out of it, we get undefined behavior. As long as we don't, we do not. Dereferencing a past-the-end address could yield such an empty lvalue, as we can't be sure usually whether there is an object located or not.

However, as the discussions on that report indicates, there are still outstanding issues (like, what happens with our overallocating case?) before it can be incorporated into the Standard.

Conclusion

I believe there is neither a right nor a wrong way about it. While i have the slight tendency to view this as generally undefined behavior, because there is no lvalue that doesn't refer to an object, i also see the current quite problematic way of typeid handling with this problem. Since this concerns an active issue in the Standard, the best you could do is doing an addition to get the pointer value, instead of dereferencing past-the-end, thus avoiding the problem altogether.

Note that all the above is no problem in C. C makes it all well-formed by saying &* is next to a no-op but just making a pointer into an rvalue, thus you can't do

(&*a) = NULL;

The same simple thing, sadly, isn't true about C++, though.

Johannes Schaub - litb
+1  A: 

This is legal:

int array[5];
int *array_begin = &array[0];
int *array_end = &array[5];

Section 5.2.1 Subscripting The expression E1[E2] is identical (by definition) to *((E1)+(E2))

So by this we can say that array_end is equivalent too:

int *array_end = &(*((array) + 5)); // or &(*(array + 5))

Section 5.3.1.1 Unary operator '*': The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T,” the type of the result is “T.” [ Note: a pointer to an incomplete type (other than cv void) can be dereferenced. The lvalue thus obtained can be used in limited ways (to initialize a reference, for example); this lvalue must not be converted to an rvalue, see 4.1. — end note ]

The important part of the above:

'the result is an lvalue referring to the object or function'.

The unary operator '*' is returning a lvalue referring to the int (no de-refeference). The unary operator '&' then gets the address of the lvalue.

As long as there is no de-referencing of an out of bounds pointer then the operation is fully covered by the standard and all behavior is defined. So by my reading the above is completely legal.

The fact that a lot of the STL algorithms depend on the behavior being well defined, is a sort of hint that the standards committee has already though of this and I am sure there is a something that covers this explicitly.

Martin York
A: 

It is perfectly legal.

The vector<> template class from the stl does exactly this when you call myVec.end(): it gets you a pointer (here as an iterator) which points one element past the end of the array.

codymanix