ansaurus

Question

Answer 1

+5 A:

C gives you sufficient guarantees that your first approach will work. The only modification you need to make is that in order to make the pointer aliasing OK, you must have a union in scope that contains all of the structs that you are casting between:

union allow_aliasing {
    struct Class class;
    struct Object object;
    struct Integer integer;
    struct String string;
};

(You don't need to ever use the union for anything - it just has to be in scope)

I believe the relevant part of the standard is this:

[#5] With one exception, if the value of a member of a union object is used when the most recent store to the object was to a different member, the behavior is implementation-defined. One special guarantee is made in order to simplify the use of unions: If a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

(This doesn't directly say it's OK, but I believe that it does guarantee that if two structs have a common intial sequence and are put into a union together, they'll be laid out in memory the same way - it's certainly been idiomatic C for a long time to assume this, anyway).

caf 2009-09-28 05:13:32

I would probably upvote you, but I'm too tired to really be able to think about code right now.

Chris Lutz 2009-09-28 05:17:47

The requirement for the union is pretty close to purely theoretical though. The reason is pretty simple: if you create one of these structs, and pass it to code in another translation unit, and that TU does define the union, the struct has to be compatible. Since the compiler doesn't know about any other TUs, it's left with only one choice: assure the structs are compatible in case you might...

Jerry Coffin 2009-09-28 05:22:49

Jerry: Sure, you know that they'll be laid out the same way in memory - but in the absence of the union the compiler is free to optimise under the assumption that if you modify an object of type `struct String`, no objects of type `struct Object` will be changed. This is known as "strict aliasing".

caf 2009-09-28 05:33:14

@caf: that could only possibly apply if the variables were of the union type -- it can't apply to the separate structures. At the least, the code would have to be using the union to get the guarantee provided by the quoted section (where does it appear in the C99 standard, BTW?).

Jonathan Leffler 2009-09-28 05:58:41

@caf: section 6.2.5 of ISO 9899:1999 says: _A structure type describes a sequentially allocated nonempty set of member objects(and, in certain circumstances, an incomplete array), each of which has an optionallyspecified name and possibly distinct type._ Section 6.7.2.1 also says: _As discussed in 6.2.5, a structure is a type consisting of a sequence of members, whosestorage is allocated in an ordered sequence, and a union is a type consisting of a sequenceof members whose storage overlap._

Jonathan Leffler 2009-09-28 06:06:15

Answer 2

+2 A:

The problem is that, as far as I know, the C standard makes no promises about how structures are stored. On my platform this works. But on another platform struct String might store value before class and when I accessed foo->class in the above I would actually be accessing foo->value, which is obviously bad. Portability is a big goal here.

I believe you're wrong here. First, because your struct String doesn't have a value member. Second, because I believe C does guarantee the layout in memory of your struct's members. That's why the following are different sizes:

struct {
    short a;
    char  b;
    char  c;
}

struct {
    char  a;
    short b;
    char  c;
}

If C made no guarantees, then compilers would probably optimize both of those to be the same size. But it guarantees the internal layout of your structs, so the natural alignment rules kick in and make the second one larger than the first.

Chris Lutz 2009-09-28 05:16:22

Care to correct whatever you find factually inaccurate? Or do you just want to downvote?

Chris Lutz 2009-09-28 05:22:36

I didn't downvote, but C definitely does not guarantee the layout in memory of member variables, however you ARE guaranteed that you can always cast a pointer to a struct to a pointer to the first member of the struct.

Falaina 2009-09-28 05:50:58

+1: It looks fine to me, too. I suppose the most pedantic could argue that on a machine where there is insufficient penalty for misaligned access to the short member, the structures could be the same size; I'm not aware of such a machine. And some compilers support a pragma to achieve that effect. Nevertheless, where portability is the goal (as stated in the question), the only safe assumption is that the two structures will have different sizes.

Jonathan Leffler 2009-09-28 05:55:06

@Falaina: you are guaranteed that the sequence of the members is as written in the structure declaration.

Jonathan Leffler 2009-09-28 05:55:43

The only downvote I can imagine is because `short` and `char` might be the same size on some machine, but it seems really obtuse to downvote for a technicality in a simple example meant to demonstrate a point.

Chris Lutz 2009-09-28 05:58:12

@Chris - I've quoted chapter and verse on where the standard justifies the conclusion you gave.

Jonathan Leffler 2009-09-28 06:11:14

@Jonathan - I love it when someone who has and knows the standard can quote it for my edification. And for about the millionth time, I really need to sit down and read it.

Chris Lutz 2009-09-28 06:13:12

Answer 3

+2 A:

I appreciate the pedantic issues raised by this question and answers, but I just wanted to mention that CPython has used similar tricks "more or less forever" and it's been working for decades across a huge variety of C compilers. Specifically, see object.h, macros like PyObject_HEAD, structs like PyObject: all kinds of Python Objects (down at the C API level) are getting pointers to them forever cast back and forth to/from PyObject* with no harm done. It's been a while since I last played sea lawyer with an ISO C Standard, to the point that I don't have a copy handy (!), but I do believe that there are some constraints there that should make this keep working as it has for nearly 20 years...

Alex Martelli 2009-09-28 05:27:18

Alex: You might be interested in this article on strict aliasing: http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html

caf 2009-09-28 05:37:41

On the other hand, see PEP 3123 (http://www.python.org/dev/peps/pep-3123/) for why Python changed the definition of PyObject_HEAD in Py3k to conform to standard C.

Josh Haberman 2009-09-28 07:13:49

Answer 4

+3 A:

Section 6.2.5 of ISO 9899:1999 (the C99 standard) says:

A structure type describes a sequentially allocated nonempty set of member objects (and, in certain circumstances, an incomplete array), each of which has an optionally specified name and possibly distinct type.

Section 6.7.2.1 also says:

As discussed in 6.2.5, a structure is a type consisting of a sequence of members, whose storage is allocated in an ordered sequence, and a union is a type consisting of a sequence of members whose storage overlap.

[...]

Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

This guarantees what you need.

In the question you say:

The problem is that, as far as I know, the C standard makes no promises about how structures are stored. On my platform this works.

This will work on all platforms. It also means that your first alternative - what you are currently using - is safe enough.

But on another platform struct ~~String~~Integer might store value before class and when I accessed foo->class in the above I would actually be accessing foo->value, which is obviously bad. Portability is a big goal here.

No compliant compiler is allowed to do that. [I replaced String by Integer assuming you were referring to the first set of declarations. On closer examination, you might have been referring to the structure with an embedded union. The compiler still isn't allowed to reorder class and value.]

Jonathan Leffler 2009-09-28 06:10:14

The sections you cite guarantee the layout of the struct, however the standard also says "An object shall have its stored value accessed only by an lvalue expression that has one of the following types:", followed by a list of conditions (6.5 bullet 7). Accessing an `Integer*` through an `Object*` is undefined AFAIK, and could cause inappropriate optimizations to be performed. This is why Python stopped using this style, see http://www.python.org/dev/peps/pep-3123/ .

Josh Haberman 2009-09-28 07:28:03

This is good news; at least for compilers that comply with this part of the C99 standard, my code will work.

Imagist 2009-09-28 07:28:38

@Josh Haberman: I'll have to read the PEP rather more carefully than I'm willing to at this time of night. However, superficially, it looks like the fix is very similar to the code above. I presume I'm missing something.

Jonathan Leffler 2009-09-28 08:37:36

Answer 5

+1 A:

See Python PEP 3123 (http://www.python.org/dev/peps/pep-3123/) for how Python solves this problem using standard C. The Python solution can be directly applied to your problem. Essentially you want to do this:

struct Object { struct Class* class; };
struct Integer { struct Object object; int value; };
struct String { struct Object object; size_t length; char* characters; };

You can safely cast Integer* to Object*, and Object* to Integer* if you know that your object is an integer.

Josh Haberman 2009-09-28 07:18:12

Thanks for that link; I learned a lot from it.

Imagist 2009-09-28 07:42:04

According to your link, it looks like this can be done with less indirection than your code; specifically: "[I]f a `struct` starts with an `int`, the `struct *` may also be cast to an `int *`, allowing to write int values into the first field." This means that in this case the `struct Integer*` can be cast to a `struct Class**`, meaning that I don't have to change my declarations; I only need to be sure to reference the class through pointers (that's how I'm passing them around anyway).

Imagist 2009-09-28 07:49:07

ansaurus

tags:

views:

answers:

Representing dynamic typing in C.

related questions