views:

206

answers:

4

Visual C++ 2008 C runtime offers an operator 'offsetof', which is actually macro defined as this:

#define offsetof(s,m)   (size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))

This allows you to calculate the offset of the member variable m within the class s.

What I don't understand in this declaration is:

  1. Why are we casting m to anything at all and then dereferencing it? Wouldn't this have worked just as well:

    &(((s*)0)->m) ?

  2. What's the reason for choosing char reference (char&) as the cast target?

  3. Why use volatile? Is there a danger of the compiler optimizing the loading of m? If so, in what exact way could that happen?

+2  A: 

An offset is in bytes. So to get a number expressed in bytes, you have to cast the addresses to char, because that is the same size as a byte (on this platform).

The use of volatile is perhaps a cautious step to ensure that no compiler optimisations (either that exist now or may be added in the future) will change the precise meaning of the cast.

Update:

If we look at the macro definition:

(size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))

With the cast-to-char removed it would be:

(size_t)&((((s *)0)->m))

In other words, get the address of member m in an object at address zero, which does look okay at first glance. So there must be some way that this would potentially cause a problem.

One thing that springs to mind is that the operator & may be overloaded on whatever type m happens to be. If so, this macro would be executing arbitrary code on an "artificial" object that is somewhere quite close to address zero. This would probably cause an access violation.

This kind of abuse may be outside the applicability of offsetof, which is supposed to only be used with POD types. Perhaps the idea is that it is better to return a junk value instead of crashing.

(Update 2: As Steve pointed out in the comments, there would be no similar problem with operator ->)

Daniel Earwicker
Frederick
The overloaded operator-> isn't a problem. To get the overloaded behaviour, you'd apply it to the instance - not the pointer.
Steve314
@Steve - you're right, I've removed that.
Daniel Earwicker
A: 

char is guarenteed to be the smallest number of bits the architectural can "bite" (aka byte).

All pointers are actually numbers, so cast adress 0 to that type because it's the beginning.

Take the address of member starting from 0 (resulting into 0 + location_of_m).

Cast that back to size_t.

LiraNuna
It doesn't "cast to 0" (impossible because is `0` is not a type). It casts the value `0` to type `s`, to get a notional object of that type at memory address `0`. From that point on, your description makes sense.
Daniel Earwicker
My bad, English is not my primary language, so I make mistakes often :).
LiraNuna
+1  A: 

offsetof is something to be very careful with in C++. It's a relic from C. These days we are supposed to use member pointers. That said, I believe that member pointers to data members are overdesigned and broken - I actually prefer offsetof.

Even so, offsetof is full of nasty surprises.

First, for your specific questions, I suspect the real issue is that they've adapted relative to the traditional C macro (which I thought was mandated in the C++ standard). They probably use reinterpret_cast for "it's C++!" reasons (so why the (size_t) cast?), and a char& rather than a char* to try to simplify the expression a little.

Casting to char looks redundant in this form, but probably isn't. (size_t) is not equivalent to reinterpret_cast, and if you try to cast pointers to other types into integers, you run into problems. I don't think the compiler even allows it, but to be honest, I'm suffering memory failure ATM.

The fact that char is a single byte type has some relevance in the traditional form, but that may only be why the cast is correct again. To be honest, I seem to remember casting to void*, then char*.

Incidentally, having gone to the trouble of using C++-specific stuff, they really should be using std::ptrdiff_t for the final cast.

Anyway, coming back to the nasty surprises...

VC++ and GCC probably won't use that macro. IIRC, they have a compiler intrinsic, depending on options.

The reason is to do what offsetof is intended to do, rather than what the macro does, which is reliable in C but not in C++. To understand this, consider what would happen if your struct uses multiple or virtual inheritance. In the macro, when you dereference a null pointer, you end up trying to access a virtual table pointer that isn't there at address zero, meaning that your app probably crashes.

For this reason, some compilers have an intrinsic that just uses the specified structs layout instead of trying to deduce a run-time type. But the C++ standard doesn't mandate or even suggest this - it's only there for C compatibility reasons. And you still have to be careful if you're working with class heirarchies, because as soon as you use multiple or virtual inheritance, you cannot assume that the layout of the derived class matches the layout of the base class - you have to ensure that the offset is valid for the exact run-time type, not just a particular base.

If you're working on a data structure library, maybe using single inheritance for nodes, but apps cannot see or use your nodes directly, offsetof works well. But strictly speaking, even then, there's a gotcha. If your data structure is in a template, the nodes may have fields with types from template parameters (the contained data type). If that isn't POD, technically your structs aren't POD either. And all the standard demands for offsetof is that it works for POD. In practice, it will work - your type hasn't gained a virtual table or anything just because it has a non-POD member - but you have no guarantees.

If you know the exact run-time type when you dereference using a field offset, you should be OK even with multiple and virtual inheritance, but ONLY if the compiler provides an intrinsic implementation of offsetof to derive that offset in the first place. My advice - don't do it.

Why use inheritance in a data structure library? Well, how about...

class node_base                       { ... };
class leaf_node   : public node_base  { ... };
class branch_node : public node_base  { ... };

The fields in the node_base are automatically shared (with identical layout) in both the leaf and branch, avoiding a common error in C with accidentally different node layouts.

BTW - offsetof is avoidable with this kind of stuff. Even if you are using offsetof for some jobs, node_base can still have virtual methods and therefore a virtual table, so long as it isn't needed to dereference member variables. Therefore, node_base can have pure virtual getters, setters and other methods. Normally, that's exactly what you should do. Using offsetof (or member pointers) is a complication, and should only be used as an optimisation if you know you need it. If your data structure is in a disk file, for instance, you definitely don't need it - a few virtual call overheads will be insignificant compared with the disk access overheads, so any optimisation efforts should go into minimising disk accesses.

Hmmm - went off on a bit of a tangent there. Whoops.

Steve314
A: 

1) I also do not know why it is done in this way.

2) The char type is special in two ways.

No other type has weaker alignment restrictions than the char type. This is important for reinterpret cast between pointers and between expression and reference.

It is also the only type (together with its unsigned variant) for which the specification defines behavior in case the char is used to access stored value of variables of different type. I do not know if this applies to this specific situation.

3) I think that the volatile modifier is used to ensure that no compiler optimization will result in attempt to read the memory.

Komat