tags:

views:

1109

answers:

4

I thought I really understood this, and re-reading the standard (ISO 9899:1990) just confirms my obviously wrong understanding, so now I ask here.

The following program crashes:

#include <stdio.h>
#include <stddef.h>

typedef struct {
    int array[3];
} type1_t;

typedef struct {
    int *ptr;
} type2_t;

type1_t my_test = { {1, 2, 3} };

int main(int argc, char *argv[])
{
    (void)argc;
    (void)argv;

    type1_t *type1_p =             &my_test;
    type2_t *type2_p = (type2_t *) &my_test;

    printf("offsetof(type1_t, array) = %lu\n", offsetof(type1_t, array)); // 0
    printf("my_test.array[0]  = %d\n", my_test.array[0]);
    printf("type1_p->array[0] = %d\n", type1_p->array[0]);
    printf("type2_p->ptr[0]   = %d\n", type2_p->ptr[0]);  // this line crashes

    return 0;
}

Comparing the expressions my_test.array[0] and type2_p->ptr[0] according to my interpretation of the standard:

6.3.2.1 Array subscripting

"The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))."

Applying this gives:

my_test.array[0]
(*((E1)+(E2)))
(*((my_test.array)+(0)))
(*(my_test.array+0))
(*(my_test.array))
(*my_test.array)
*my_test.array

type2_p->ptr[0]
*((E1)+(E2)))
(*((type2_p->ptr)+(0)))
(*(type2_p->ptr+0))
(*(type2_p->ptr))
(*type2_p->ptr)
*type2_p->ptr

type2_p->ptr has type "pointer to int" and the value is the start address of my_test. *type2_p->ptr therefore evaluates to an integer object whose storage is at the same address that my_test has.

Further:

6.2.2.1 Lvalues, arrays, and function designators

"Except when it is the operand of the sizeof operator or the unary & operator, ... , an lvalue that has type array of type is converted to an expression with type pointer to type that points to the initial element of the array object and is not an lvalue."

my_test.array has type "array of int" and is as described above converted to "pointer to int" with the address of the first element as value. *my_test.array therefore evaluates to an integer object whose storage is at the same address that the first element in the array.

And finally

6.5.2.1 Structure and union specifiers

A pointer to a structure object, suitably converted, points to its initial member ..., and vice versa. There may be unnamed padding within a structure object, but not at its beginning, as necessary to achieve the appropriate alignment.

Since the first member of type1_t is the array, the start address of that and the whole type1_t object is the same as described above. My understanding were therefore that *type2_p->ptr evaluates to an integer whose storage is at the same address that the first element in the array and thus is identical to *my_test.array.

But this cannot be the case, because the program crashes consistently on solaris, cygwin and linux with gcc versions 2.95.3, 3.4.4 and 4.3.2, so any environmental issue is completely out of the question.

Where is my reasoning wrong/what do I not understand? How do I declare type2_t to make ptr point to the first member of the array?

+2  A: 

Where is my reasoning wrong/what do I not understand?

type_1::array (not strictly C syntax) is not an int *; it is an int [3].

How do I declare type2_t to make ptr point to the first member of the array?

typedef struct 
{    
    int ptr[];
} type2_t;

That declares a flexible array member. From the C Standard (6.7.2.1 paragraph 16):

However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array.

I.e., it can alias type1_t::array properly.

MSN
+7  A: 

Please forgive me if i overlook anything in your analysis. But i think the fundamental bug in all that is this wrong assumption

type2_p->ptr has type "pointer to int" and the value is the start address of my_test.

There is nothing that makes it have that value. Rather, it is very probably that it points somewhere to

0x00000001

Because what you do is to interpret the bytes making up that integer array as a pointer. Then you add something to it and subscript.

Also, i highly doubt your casting to the other struct is actually valid (as in, guaranteed to work). You may cast and then read a common initial sequence of either struct if both of them are members of an union. But they are not in your example. You also may cast to a pointer to the first member. For example:

typedef struct {
    int array[3];
} type1_t;

type1_t f = { { 1, 2, 3 } };

int main(void) {
    int (*arrayp)[3] = (int(*)[3])&f;
    (*arrayp)[0] = 3;
    assert(f.array[0] == 3);
    return 0;
}
Johannes Schaub - litb
Thank you for correctly pointing out my incorrect assumption (the `type2_t *type2_p = (type2_t *) ` type cast). Sorry for not accepting your answer, but I will select Chuck's answer which I find a little bit precise.
hlovdal
+5  A: 

An array is a kind of storage. Syntactically, it's used as a pointer, but physically, there's no "pointer" variable in that struct — just the three ints. On the other hand, the int pointer is an actual datatype stored in the struct. Therefore, when you perform the cast, you are probably* making ptr take on the value of the first element in the array, namely 1.

*I'm not sure this is actually defined behavior, but that's how it will work on most common systems at least.

Chuck
It definitely is defined behaviour. The address of ptr is the same as the address of my_array. my_array is actually a pointer into the structure, whereas ptr is simply an integer pointer within a structure.
Vitali
"defined behavior" doesn't mean "something happens", it means "the something that happens is defined by the standard". Type punning is undefined behavior. If you want to see something surprising happen when you type pun, kick up the optimizations a notch or two on your compiler.
Logan Capaldo
A: 

It's got to be defined behaviour. Think about it in terms of memory.

For simplicity, assume my_test is at address 0x80000000.

type1_p == 0x80000000
&type1_p->my_array[0] == 0x80000000 // my_array[0] == 1
&type1_p->my_array[1] == 0x80000004 // my_array[1] == 2
&type1_p->my_array[2] == 0x80000008 // my_array[2] == 3

When you cast it to type2_t,

type2_p == 0x80000000
&type2_p->ptr == 0x8000000 // type2_p->ptr == 1
type2_p->ptr[0] == *(type2_p->ptr) == *1

To do what you want, you would have to either create a secondary structure & assign the address of the array to ptr (e.g. type2_p->ptr = type1_p->my_array) or declare ptr as an array (or a variable length array, e.g. int ptr[]).

Alternatively, you could access the elements in an ugly manner : (&type2_p->ptr)[0], (&type2_p->ptr)[1]. However, be careful here since (&type2_p->ptr)[0] will actually be an int_, not an int. On 64-bit platforms, for instance, (&type2_p->ptr)[0] will actually be 0x100000002 (4294967298).

Vitali