tags:

views:

828

answers:

6

In ANSI C, offsetof is defined as below.

#define offsetof(st, m) \
    ((size_t) ( (char *)&((st *)(0))->m - (char *)0 ))

Why won't this throw a segmentation fault since we are dereferencing a NULL pointer? Or is this some sort of compiler hack where it sees that only address of the offset is taken out, so it statically calculates the address without actually dereferencing it? Also is this code portable?

+29  A: 

At no point in the above code is anything dereferenced. A dereference occurs when the * or -> is used an an address value to find referenced value. The only use of * above is in a type declaration for the purpose of casting.

The -> operator is used above but it's not used to access the value. Instead it's used to grab the address of the value. Here is a non-macro code sample that should make it a bit clearer

SomeType *pSomeType = GetTheValue();
int* pMember = &(pSomeType->SomeIntMember);

The second line does not actually cause a dereference. It simply returns the address of SomeIntMember within the pSomeType value.

What you see is a lot of casting between arbitrary types and char pointers. The reason for char is that it's one of the only type (perhaps the only) type in the C89 standard which has an explicit size. The size is 1. By ensuring the size is one, the above code can do the evil magic of calculating the true offset of the value.

JaredPar
+1 for evil magic ;-).
Gamecat
@Gamecat, I just call them like I see them ;)
JaredPar
Doesn't the indirect member selector (->) count as a dereference in this context?
Sean Bright
@JaredPar, AFAIK, size of char* is 4 in a 32 bit machine.
chappar
I thought a->b is just syntactic sugar for (*a).b
Dan
As Sean Bright pointed out, -> operator dereferences the pointer
chappar
@chappar corrected
JaredPar
@Sean it's part of the evil magic. It's taking the address of the value without actively dereferencing
JaredPar
@chappar, the size of char * varies, but the size of char is defined as 1 by the standard. Everything else depends on the implementation.
David Thornley
@Jared: care to expand on the "evil magic" part? Or are we just leaving it at that? :)
Sean Bright
David Thornley
Dan
Good enough for me.
Sean Bright
I don't have a C standard available, but I thought I remembered something in C90 about not necessarily being able to use (not only dereference) arbitrary addresses. The rationale was machines like the 8086 and IBM 370 that used segment registers, and couldn't refer to their entire address space.
David Thornley
chappar
@chappar, the subtraction is needed to convert the pointer type into an integer type. The macro returns a size_t as the offset of the member in question from address 0.
Matt Kane
@mkb, we are casting the pointer value to size_t anyway. So, i think that subtraction was not really needed.
chappar
JaredPar
(cont) pointer subtraction (especially with char) though is defined and will result in the correct value with the appropriate type.
JaredPar
I don't know about C but I am *very* sure that this is illegal C++. We tried to exploit a similar situation in our code and the VC++ debugger actually stopped execution because of the invalid dereferencing (even though no value was read/written, it was only used as above to calculate an address).
Konrad Rudolph
@JaredPar, you may be right. That subtraction might be there to see off the compiler warnings.
chappar
BTW, the subtraction is probably needed because of alignment issues in memory. I'm not entirely sure but a cast from `ptrdiff_t` to `size_t` is always feasible and well-defined, while a cast from `char*` to `size_t` might not be (again, it isn't in C++).
Konrad Rudolph
+1  A: 

It doesn't segfault because you're not dereferencing it. The pointer address is being used as a number that's subtracted from another number, not used to address memory operations.

chaos
+3  A: 

It calculates the offset of the member m relative to the start address of the representation of an object of type st.

((st *)(0)) refers to a NULL pointer of type st *. &((st *)(0))->m refers to the address of member m in this object. Since the start address of this object is 0 (NULL), the address of member m is exactly the offset.

char * conversion and the difference calculates the offset in bytes. According to pointer operations, when you make a difference between two pointers of type T *, the result is the number of objects of type T represented between the two addresses contained by the operands.

Cătălin Pitiș
chappar
I think that the subtraction is not really needed, but I am not 100% sure...
Cătălin Pitiș
+4  A: 

In ANSI C, offsetof is NOT defined like that. One of the reasons it's not defined like that is that some environments will indeed throw null pointer exceptions, or crash in other ways. Hence, ANSI C leaves the implementation of offsetof( ) open to compiler builders.

The code shown above is typical for compilers/environments that do not actively check for NULL pointers, but fail only when bytes are read from a NULL pointer.

MSalters
+5  A: 

To answer the last part of the question, the code is not portable.

The result of subtracting two pointers is defined and portable only if the two pointers point to objects in the same array or point to one past the last object of the array (7.6.2 Additive Operators, H&S Fifth Edition)

sigjuice
+1  A: 

Although that is a typical implementation of offsetof, it is not mandated by the standard, which just says:

The following types and macros are defined in the standard header <stddef.h> [...]

    offsetof(type, member-designator)

which expands to an integer constant expression that has type size_t, the value of which is the offset in bytes, to the structure member (designated by member-designator), from the beginning of its structure (designated by type). The type and member designator shall be such that given

    static type t;

then the expression &(t.member-designator) evaluates to an address constant. (If the specified member is a bit-field, the behavior is undefined.)

Read P J Plauger's "The Standard C Library" for a discussion of it and the other items in <stddef.h> which are all border-line features that could (should?) be in the language proper, and which might require special compiler support.

It's of historic interest only, but I used an early ANSI C compiler on 386/IX (see, I told you of historic interest, circa 1990) that crashed on that version of offsetof but worked when I revised it to:

#define offsetof(st, m) ((size_t)((char *)&((st *)(1024))->m - (char *)1024))

That was a compiler bug of sorts, not least because the header was distributed with the compiler and didn't work.

Jonathan Leffler