views:

425

answers:

4

Consider the following code :

void populate(int *arr)
{
   for(int j=0;j<4;++j)
       arr[j]=0;
}

int main()
{
   int array[2][2];
   populate(&array[0][0]);
}

There was a discussion regarding this on a local community whether the code is valid or not(Am I supposed to mention its name?). One guy was saying that it invokes UB because it violates

C++ Standard(section 5.7-5 [expr.add])

"If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."

But I don't see anything wrong with the code,the code is perfectly OK for me.

So, I just want to know is this code valid or not? Am I missing something?

+14  A: 

Your array is two arrays of int[2], while your function populate() treats it as a single array of int[4]. Depending on exactly how the compiler decides to align the elements of array, this may not be a valid assumption.

Specifically, when j is 2 and you try to access arr[2], this is outside the bounds of main's array[0] and is therefore invalid.

Greg Hewgill
But when you access `arr[2]`, `arr` is not an "array [2] of anything", it is a pointer. I am not sure if this argument applies.
Alok
@Alok: In C++, multidimensional arrays are a different thing than arrays of pointers. Java only has the latter, but C and C++ have both forms.
Greg Hewgill
@Greg, OK. In C, there is nothing like "multidimensional arrays", there are just arrays (which in turn can be composed of arrays, and so on). Not sure if you were implying that C++ has true multidimensional arrays.
Alok
Sorry but some of this is garbage.1. int A[2][2]; is a multi-dimansion array. It is also an array of arrays (they are not mutually exclusive).2. The code is legal and will always work (C/C++ arrays are required to be contiguous).3. Never do it again. "Legal" != "Smart". If you handed that to me I'd break your knuckles. It is deliberately confusing.
Michael J
Michael J is correct in that elements are contiguous, however the behavior is still undefined, the implementation is allowed to perform bounds checking for example.
avakar
@avakar - The function "popuate()" gets a pointer to an integer. How can it do any bounds checking? You can check that the process owns the memory, which it does. You cannot reasonably check the object size.
Michael J
Michael J, because the pointer itself can possibly carry the necessary information (the pointers are then called *fat*). I think there even were real implementations that did this.
avakar
@Michael J: It doesn't matter *how* it'd do it. Just the fact that it is allowed is enough to render it undefined behavior.
jalf
I am now convinced that it is undefined behavior. Please see the link in my post or the explanation in my post.
Alok
@Greg/Alok: I'm not sure I understand what you mean by "depending on how the compiler decides to align the elements of array". What does align have to do with this? If I understood Alok's answer (please correct me if not), a standard-conformant compiler *has* to allocate the memory in a specific way for this definition. The undefined behaviour doesn't come from how the compiler chooses to lay out the memory, but by invalid indexing done later.
Edan Maor
I should note that while it may actually work with a stack allocated array, if it were dynamically allocated there would be a problem because of the book-keeping of the array size usually just placed before the first element of the array.
Matthieu M.
Alok
Emile Cormier
@Alok: So, if I inderstand correctly, the standard allows int* to be implemented as more than a mere 32/64-bit address and can have other "housekeeping" data? **Edit:** It seems you deleted the comment I'm replying to here.
Emile Cormier
@Emile: Yes, the standard doesn't care how a pointer is implemented as long as it behaves as expected. Pointers are not memory addresses. Think of compilers targeting Flashbytecode for example. Raw pointers don't exist in that environment, so pointers have to be compiled to... some other representation. A pointer is a C/C++ concept. Just because it is *typically* compiled down to a memory address, that doesn't mean it's required to be equivalent.
jalf
@Emile, yeah, my comment wasn't that well-formed, and I thought I would write it again, but by then jalf replied. Sorry.
Alok
Guys, if your interpretation were true then casting would be illegal. How would malloc work? It returns a void * pointer and is cast to whatever. You have misunderstood the standard.
Michael J
`void *` is not cast to whatever, it is guaranteed by the standard that a pointer to `void *` can be *converted* to any other pointer type. `malloc` guarantees alignment for any type. Also, a pointer can be converted to `void *` and back, and that round trip is guaranteed to yield the original pointer.
Alok
+3  A: 

That is not going to always work. C has arrays of arrays, not 2D arrays. The sub-arrays are not always specified to be contiguous in memory(static arrays might be, check C/C++ standard) In this particular example, I suspect it works correctly. However, if you had dynamically allocated the memory being passed in, you quite possibly would fail, because malloc(or new) might have put the subarrays quite far apart.

If, however, you want to linearly walk down '2d' memory, you can construct a 2D accessor against a 1D array and it will work fine and things like memset will work against the 1D array.

Paul Nathan
Arrays are required to have contiguous memory (section 6.2.5, bullet 10 in the ISO C99 standard). And if multidimensional arrays are defined to be arrays of arrays, then doesn't that imply that they too are contiguous (the n+1st array must immediately follow the n-th array in memory)?
jamesdlin
Oops, I meant bullet 20.
jamesdlin
They must be contiguous, but there might be implementation-defined padding between them. I'm not aware of this being a problem for `int` on any compiler I've used, though.
Mark Ransom
Ah, good point. Padding could be a problem for multidimensional `char` arrays.
jamesdlin
Where would the padding be? Between `array[i]`'s last element and `array[i+1]`'s first element? But `array[i]` and `array[i+1]` are successive elements of an array, so there can't be any padding.
Alok
malloc returns a block of contiguous memory. Treating the result of malloc as an array of int arrays will never result in 'subarrays quite far apart'.
Pete Kirkham
If there is padding in arrays, how do you portably allocate them dynamically? `int (*multi_array)[3] = malloc(rows * sizeof(int[3]));` wouldn't work by that reasoning ( http://en.wikibooks.org/wiki/C_Programming/Common_practices#Dynamic_multidimensional_arrays )
Pete Kirkham
The behavior is undefined not because of padding, but because "the standard says so".
Alok
The standard says that `int[3][rows]` is an array of objects of type `int[3]`, and that elements in arrays are contiguous. So why is the code I posted undefined?
Pete Kirkham
@Pete: `malloc` indeed allocates contiguous space, but the question isn't about `malloc` (or dynamically allocated memory) at all.
Alok
+7  A: 

You have an array of arrays. array[1] follows array[0] in memory, because arrays are contiguous. If p == array[0], then p[1] follows p[0], because arrays are contiguous. So, you are right: all the memory for array is contiguous.

In pictures, array looks like this.

+-----------------+-----------------+
|      [0]        |      [1]        |
+-----------------+-----------------+

Now, let's break down array[0] and array[1], they individually look like this:

+--------+--------+
|  [0]   |  [1]   |        
+--------+--------+

So, the final picture is:

+--------+--------+--------+--------+
| [0][0] | [0][1] | [1][0] | [1][1] |
+--------+--------+--------+--------+

Now, the question is, can you access this contiguous memory the way you are. The answer is, it is not guaranteed by the standard. The arrays are contiguous, but the standard doesn't allow indexing the way you have done. In other words:

&array[0][0]+2 == &array[1][0], but (&a[0][0] + 2) + 1 is undefined whereas &a[1][0] + 1 is valid. If this seems strange, it is, but as per the quote you posted from the standard, you are only allowed to calculate a pointer that is either inside an array or at most one past the array (without dereferencing that "one past" pointer).

In practice, I doubt that this would fail anywhere, but the according to the standard at least, your code is invalid because of undefined behavior.

See this post on comp.lang.c as well.

Alok
Shouldn't the last element in the third picture be "[1][1]"?
vobject
The question is not whether the memory is contiguous (it clearly is), but whether the behavior is well-defined. I don't see how this can be "half-right". He asks if it is UB, the answer is yes. The fact that it'll usually work doesn't make it any less undefined.
jalf
jalf, you are right. When I said "half-right", I meant that some people thought that the memory need not be contiguous, therefore the behavior is undefined. On the other hand, many other people thought that the memory has to be continuous, so the behavior is well-defined.
Alok
@vobject: fixed. @jalf: I have changed the wording of my answer.
Alok
What also needs to be added to the mix is the CPU configuration. For instance, the DEC memory model is reversed from the PC memory model. So the arrays would also be reversed and accessing them as a single array would come up differently.
Dave
My copy of the standard says this in 8.3.4 Arrays [dcl.arrays] "A consistent rule is followed for multidimensional arrays. If E is an n-dimensional array of rank i × j × . . . k, then E appearing in an expression is converted to a pointer to an (n − 1)-dimensional array with rank j× ... × k. If the * operator, either explicitly or implicitly as a result of subscripting, is applied to this pointer, the result is the pointed-to (n-1)-dimensional array, which itself is immediately converted into a pointer." This suggests that it may not be UB. as the array is converted to an array of size 4.
Michael Anderson
@Michael: No, what it says is that the two-dimensional array is converted to a single-dimensional array of size 2. It's really just stating the obvious. Dereferencing a N-dimensional array yields the N-1 dimensional array that is stored as one of the array entries. In our case, each array entry in the "outer" array stores a single-dimensional array of size two. And one of those is returned when you dereference the 2D array.
jalf
A: 

In C everything is stored in linear memory segments. You are passing address of a[0][0] which would be same as address of a[0] so a[i][j] is same as a[i*ColSize+j] because everything is stored linearly. But if you allocate memory dynamically it would fail because that time all rows might not be stored in contiguous location. then a[i][j] would be *(&a[i]+j).

GG
No -- the arrays are NOT required to be stored linearly by the standard. If it's declared as a two dimensional array, it's a two dimensional array -- not a single array big enough to hold all the sub-arrays.
Billy ONeal
BillyONeal, an array of arrays is still an array and its elements are stored contiguously.
avakar