views:

144

answers:

3

Please consider the following code.

enum type {CONS, ATOM, FUNC, LAMBDA};

typedef struct{
  enum type type;
} object;

typedef struct {
  enum type type;
  object *car;
  object *cdr;
} cons_object;

object *cons (object *first, object *second) {
  cons_object *ptr = (cons_object *) malloc (sizeof (cons_object));
  ptr->type = CONS;
  ptr->car = first;
  ptr->cdr = second;
  return (object *) ptr;
}

In the cons function, variable ptr is of type cons_object*. But in the return value it is converted to type of object*.

  1. I am wondering how this is possible because cons_object and object are different structs.
  2. Are there any issues in doing stuff like this?

Any thoughts!

+7  A: 

This is fine and is a fairly common technique for implementing "object-orientation" in C. Because the memory layout of structs is well-defined in C, as long as the two object share the same layout then you can safely cast pointers between them. That is, the offset of the type member is the same in the object struct as it is in the cons_object struct.

In this case, the type member tells the API whether the object is a cons_object or foo_object or some other kind of object, so you might be see something like this:

void traverse(object *obj)
{
    if (obj->type == CONS) {
        cons_object *cons = (cons_object *)obj;
        traverse(cons->car);
        traverse(cons->cdr);
    } else if (obj->type == FOO) {
        foo_object *foo = (foo_object *)obj;
        traverse_foo(foo);
    } else ... etc
}

More commonly, I've seem implementations where the "parent" class is defined as the first member of the "child" class, like so:

typedef struct {
    enum type type;
} object;

typedef struct {
    object parent;

    object *car;
    object *cdr;
} cons_object;

This works in largely the same way, except you've got a strong gaurantee that the memory layout of the child "classes" will be the same as the parents. That is, if you add a member to the 'base' object, it'll automatically be picked up by the children and you won't have to manually make sure all of the structures are in sync.

Dean Harding
interesting. this knowledge is going to simplify my code a lot. thanks.
Appu
You should mention that this is legitimate C with well-defined behavior, and not a "hack" or invocation of "undefined behavior".
R..
Joe
+3  A: 

To add to Dean's answer, here's something about pointer conversions in general. I forgot what the term is for this, but a pointer to pointer cast performs no conversion (in the same way int to float is). It is simply a reinterpretation of the bits that they point to (all for the compiler's benefit). "Non-destructive conversion" I think it was. The data doesn't change, only how the compiler interprets what is being pointed at.

e.g.,
If ptr is a pointer to an object, the compiler knows that there is a field with a particular offset named type of type enum type. On the other hand if ptr is cast to a pointer to a different type, cons_object, again it will know how to access fields of the cons_object each with their own offsets in a similar fashion.

To illustrate imagine the memory layout for a cons_object:

                    +---+---+---+---+
cons_object *ptr -> | t | y | p | e | enum type
                    +---+---+---+---+
                    | c | a | r |   | object *
                    +---+---+---+---+
                    | c | d | r |   | object *
                    +---+---+---+---+

The type field has offset 0, car is 4, cdr is 8. To access the car field, all the compiler needs to do is add 4 to the pointer to the structure.

If the pointer was cast to a pointer to an object:

                    +---+---+---+---+
((object *)ptr)  -> | t | y | p | e | enum type
                    +---+---+---+---+
                    | c | a | r |   |
                    +---+---+---+---+
                    | c | d | r |   |
                    +---+---+---+---+

All the compiler needs to know is that there is a field called type with offset 0. Whatever is in memory is in memory.

The pointers don't even have to be related whatsoever. You can have a pointer to an int and cast it to a pointer to an cons_object. If you were to access the car field, it's just like any ordinary memory access. It has a certain offset from the beginning of the structure. In this case, what is in that memory location is unknown but that's unimportant. To access a field, only the offset is needed and that information is found in the type's definition.

A pointer to an int points to a block of memory:

                        +---+---+---+---+
int             *ptr -> | i | n | t |   | int
                        +---+---+---+---+

Casted to a cons_object pointer:

                        +---+---+---+---+
((cons_object *)ptr) -> | i | n | t |   | enum type
                        +---+---+---+---+
                        | X | X | X | X | object *
                        +---+---+---+---+
                        | X | X | X | X | object *
                        +---+---+---+---+
Jeff M
awesome!. Thanks for the detailed post!
Appu
+1  A: 

Using separate structs violates the strict aliasing rule and is undefined behaviour: http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html

Using an embedded struct as in Dean's last example is fine.

Secure