views:

332

answers:

4

I'd like to find out how C will allocate a the data items of a multidimensional array, and if their allocation is consistent across machines.

I know that, at the lowest level, the data items are neighbours, but I don't know how they're arranged further up.

For example, if I allocate a 3D array as int threeD[10][5][6], can I assume that &(threeD[4][2][5]) + 1 == &(threeD[4][3][0])? On all machines?

Thanks in advance for your help.

+2  A: 

The elements are stored in Row Major order. So Elements along the last dimension are contiguous. However, elements between rows (as indicated by your example) aren't guaranteed to be contiguous. It depends on how the initial memory has been allocated.

#include <malloc.h>
#include <stdio.h>
#include <stdlib.h>

// only elements in a single row are guaranteed to be
// contiguous because of the multiple mallocs
void main(void)
{
// 3 rows, 4 columns
int *a[3];

for ( int row = 0; row < 3; row++ )
  a[row] = (int *)malloc(4*sizeof(int));
}


// all elements are guaranteed to be contiguous
// in a row major order.
void main(void)
{
// 3 rows, 4 columns
int *a[3];

int *buf = (int *)malloc(3*4*sizeof(int));

for ( int row = 0; row < 3; row++ )
  a[row] = buf+4*row;

assert( (&a[1][3] + 1) == &a[2][0] );
}
Phillip Ngan
Elements allocated like in the example rossmcf is giving are guaranteed to be contiguous between rows. The C standard is very specific about this point.
drhirsch
As I understood it, the original question was about language feature known as "multidimensional arrays", i.e. multidimensional arrays explicitly declared as such, hence the "...C allocate..." bit in the question. While the data structures you manually build in your example implement the concept of an "multi-dimensional array", none of these hand-made multidimensional arrays actually are language-level C multidimensional arrays.
AndreyT
+3  A: 

The C standard is very specific in equating array subscripting with pointer arithmetic, and specifies that arrays are stored in row major order.

Consider the array object defined by the declaration

int x[3][5];

Here x is a 3 x 5 array of ints; more precisely, x is an array of three element objects, each of which is an array of five ints. In the expression x[i], which is equivalent to (*((x)+(i))), x is first converted to a pointer to the initial array of five ints. Then i is adjusted according to the type of x, which conceptually entails multiplying i by the size of the object to which the pointer points, namely an array of five int objects. The results are added and indirection is applied to yield an array of five ints. When used in the expression x[i][j], that array is in turn converted to a pointer to the first of the ints, so x[i][j] yields an int.

Jerry Coffin
+4  A: 

Yes, arrays are stored in row major order across all implementations of C compilers.
The Standard says (I applied some reformatting):

6.5.2.1 Array subscripting
    Constraints

3   Successive subscript operators designate an element of a multidimensional
    array object.  
    If E is an n-dimensional array (n >= 2) with dimensions i * j * . . . * k,
    then E (used a s other than an lvalue) is converted to a pointer to an
    (n - 1)-dimensional array with dimensions j * . . . * k.
    If the unary * operator is applied to this pointer explicitly, or
    implicitly as a result of subscripting, the result is the pointed-to
    (n - 1)-dimensional array, which itself is converted into a pointer if
    used as other than an lvalue. It follows from this that arrays are stored
    in row-major order (last subscript varies fastest).
pmg
+1  A: 

Firstly, In C language address arithmetic is only defined within the boundaries of a given array. (I wanted to say "single-dimensional (SD) array", but technically all arrays in C are SD. Multi-dimensional arrays are built as SD arrays of SD arrays. And this view of arrays is the most appropriate for this topic). In C you can start from the pointer to the beginning of an array and move back and forth within that array using additive operations. You are not allowed to cross the boundaries of the array you started from, except that it is legal to form a pointer to an imaginary element that follows the last element. However, when it comes to accessing elements (reading and writing), you are only allowed to access the real, existing elements of the array you started from.

Secondly, in your example '&threeD[4][2][5] + 1' you are forming a pointer to the imaginary "past-the-last" element of array 'threeD[4][2]'. This by itself is legal. However, the language specification does not guarantee that this pointer is equal to the address of '&threeD[4][3][0]'. The only thing that it says is that it might be equal to it. It is true, that the other requirements imposed on arrays by the language specification pretty much "force" this relationship to hold. But it is not formally guaranteed. Some pedantic (to the point of being malicious) implementation is perfectly allowed to use some kind of compiler magic to break this relationship.

Thirdly, actually accessing '*(threeD[4][2][5] + 1)' is always illegal. Even if the pointer is pointing into the next array, the compiler is allowed to perform the necessary run-time checks and generate a segmentation fault, since you are using pointer arithmetic on 'threeD[4][2]' array and trying to access something outside its boundaries.

Fourthly, doing 'threeD[4][2][5] + 2', '...+ 3' etc. is always illegal for similar reasons (remember: one past the end is OK, but 2, 3 or more is illegal).

And finally, fifthly: yes I know that in many (if not most) (if not all) practical cases interpreting a 'T A[2][3][4]' array as a flat 'T A[2*3*4]' array will work. But, again, from the formal language point of view this is illegal. And don't be surprised if this perfectly working code will one day trigger a huge amount of warnings from some static or dynamic code analysis tool, if not from the compiler itself.

AndreyT