tags:

views:

266

answers:

7

Hi

I have read(Inside C++ object model) that address of pointer to data member in C++ is the offset of data member plus 1?
I am trying this on VC++ 2005 but i am not getting exact offset values.
For example:

Class X{  
  public:  
    int a;  
    int b;  
    int c;
}

void x(){  
  printf("Offsets of a=%d, b=%d, c=%d",&X::a,&X::b,&X::c);
}  

Should print Offsets of a=1, b=5, c=9. But in VC++ 2005 it is coming out to be a=0,b=4,c=8.
I am not able to understand this behavior.
Excerpt from book:

"That expectation, however, is off by one—a somewhat traditional error for both C and C++ programmers.

The physical offset of the three coordinate members within the class layout are, respectively, either 0, 4, and 8 if the vptr is placed at the end or 4, 8, and 12 if the vptr is placed at the start of the class. The value returned from taking the member's address, however, is always bumped up by 1. Thus the actual values are 1, 5, and 9, and so on.
The problem is distinguishing between a pointer to no data member and a pointer to the first data member. Consider for example:

float Point3d::*p1 = 0;
float Point3d::*p2 = &Point3d::x;

// oops: how to distinguish?
if ( p1 == p2 ) {
cout << " p1 & p2 contain the same value — ";
cout << " they must address the same member!" << endl;
}
To distinguish between p1 and p2, each actual member offset value is bumped up by 1. Hence, both the compiler (and the user) must remember to subtract 1 before actually using the value to address a member."

+8  A: 

The offset of something is how many units it is from the start. The first thing is at the start so its offset is zero.

Think in terms of your structure being at memory location 100:

100: class X { int a;
104:           int b;
108:           int c;

As you can see, the address of a is the same as the address of the entire structure, so its offset (what you have to add to the structure address to get the item address) is 0.

Note that the ISO standard doesn't specify where the items are laid out in memory. Padding bytes to create correct alignment are certainly possible. In a hypothetical environment where ints were only two bytes but their required alignment was 256 bytes, they wouldn't be at 0, 2 and 4 but rather at 0, 256 and 512.


And, if that book you're taking the excerpt from is really Inside the C++ Object Model, it's getting a little long in the tooth.

The fact that it's from '96 and discusses the internals underneath C++ (waxing lyrical about how good it is to know where the vptr is, missing the whole point that that's working at the wrong abstraction level and you should never care) dates it quite a bit. In fact, the introduction even states "Explains the basic implementation of the object-oriented features ..." (my italics).

And the fact that nobody can find anything in the ISO standard saying this behaviour is required, along the fact that neither MSVC not gcc act that way, leads me to believe that, even if this was true of one particular implementation far in the past, it's not true (or required to be true) of all.

The author apparently led the cfront 2.1 and 3 teams and, while this books seems of historical interest, I don't think it's relevant to the modern C++ language (and implementation), at least those bits I've read.

paxdiablo
+3  A: 

The behavior you're getting looks quite reasonable to me. What sounds wrong is what you read.

Jerry Coffin
Not to mention that having member variables on an uneven address would be quite inefficient.
humbagumba
have added excerpt from the book. Please have a look
theneuronarc
I've looked. I still think what I said above is pretty accurate -- at best, he's describing a method used by some particular compiler, not a general requirement. Offhand, I'm not sure I've ever seen a compiler that worked that way, but even if it did, I don't see much relevance.
Jerry Coffin
+1  A: 

I have read that address of pointer to data member in C++ is the offset of data member plus 1?

I have never heard that, and your own empirical evidence shows it's not the case. I think you misunderstood an odd property of structs & class in C++. If they are completely empty, they nevertheless have a size of 1 (so that each element of an array of them has a unique address)

James Curran
I have added excerpt from the book. Please have a look.
theneuronarc
A: 

$9.2/12 is interesting

Nonstatic data members of a (non-union) class declared without an intervening access-specifier are allocated so that later members have higher addresses within a class object. The order of allocation of nonstatic data members separated by an access-specifier is unspecified (11.1). Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; so might requirements for space for managing virtual functions (10.3) and virtual base classes (10.1).

This explains that such behavior is implementation defined. However the fact that 'a', 'b' and 'c' are at increasing addresses is in accordance with the Standard.

Chubsdad
But could it be that those address’s are not contiguous?
JustBoo
+4  A: 

Firstly, the internal representation of values of a pointer to a data member type is an implementation detail. It can be done in many different ways. You came across a description of one possible implementation, where the pointer contains the offset of the member plus 1. It is rather obvious where that "plus 1" come from: that specific implementation wants to reserve the physical zero value (0x0) for null pointer, so the offset of the first data member (which could easily be 0) has to be transformed to something else to make it different from a null pointer. Adding 1 to all such pointers solves the problem.

However, it should be noted that this is a rather cumbersome approach (i.e. the compiler always has to subtract 1 from the physical value before performing access). That implementation was apparently trying very hard to make sure that all null-pointers are represented by a physical zero-bit pattern. To tell the truth, I haven't encountered implementations that follow this approach in practice these days.

Today, most popular implementations (like GCC or MSVC++) use just the plain offset (not adding anything to it) as the internal representation of the pointer to a data member. The physical zero will, of course, no longer work for representing null pointers, so they use some other physical value to represent null pointers, like 0xFFFF... (this is what GCC and MSVC++ use).

Secondly, I don't understand what you were trying to say with your p1 and p2 example. You are absolutely wrong to assume that the pointers will contain the same value. They won't.

If we follow the approach described in your post ("offset + 1"), then p1 will receive the physical value of null pointer (apparently a physical 0x0), while the p2 whill receive physical value of 0x1 (assuming x has offset 0). 0x0 and 0x1 are two different values.

If we follow the approach used by modern GCC and MSVC++ compilers, then p1 will receive the physical value of 0xFFFF.... (null pointer), while p2 will be assigned a physical 0x0. 0xFFFF... and 0x0 are again different values.

P.S. I just realized that the p1 and p2 example is actually not yours, but a quote from a book. Well, the book, once again, is describing the same problem I mentioned above - the conflict of 0 offset with 0x0 representation for null pointer, and offers one possible viable approach to solving that conflict. But, once again, there are alternative ways to do it, and many compilers today use completely different approaches.

AndreyT
MSVC actually has three or four different pointer to member representations depending on the inheritance model assumed for forward declared types.
MSN
@MSN: That usually applies to pointers to member *functions*. Pointers to *data* members are notably simpler. They are significantly less sensitive to the inheritance model (or not sensitive at all). Normally, one can implement them as plain offset in *any* inheritance model. If MSVC++ is doing something more complicated, I don't know the reason for that.
AndreyT
@AndreyT: You have observed the right problem. This problem is not about the alignment issues. Its about differentiating null pointer to data member to that of initilized ones. Thanks.
theneuronarc
@AndreyT, you are forgetting pointer to members of virtual base classes. That one is also less forgiving.
MSN
@MSN: No, I'm not forgetting anything. The issue with pointers to member *functions* is that in general case the non-trivial calculation of the proper `this` pointer value has to performed at the moment of *dereference*. This is why pointers to member *functions* have to carry quite a bit of extra information with them. This is why they are so complicated.
AndreyT
Pointers to *data* members are much simpler and all `this`-related calculations can be performed to the at the moment of conversion (up or down hierarchy). This is why pointers to data members can be implemented as simple offsets in *all* cases, even if virtual base classes are involved.
AndreyT
@AndreyT, how exactly do you encode a pointer to member to a virtual base class in a single offset? (And yes, I was originally thinking of member function pointers, not member pointers.)
MSN
@AndreyT, never mind. I understand what you stated now; basically, you can apply the offset when dereferencing after determining which type it is relative to since that is available from the member pointer type itself.
MSN
A: 

You may want to check out http://stackoverflow.com/questions/405112/how-are-objects-stored-in-memory-in-c which talks about this issue in much more detail.

BT
+1  A: 

To complement AndreyT's answer: Try running this code on your compiler.

void test()
{  
    using namespace std;

    int X::* pm = NULL;
    cout << "NULL pointer to member: "
        << " value = " << pm 
        << ", raw byte value = 0x" << hex << *(unsigned int*)&pm << endl;

    pm = &X::a;
    cout << "pointer to member a: "
        << " value = " << pm 
        << ", raw byte value = 0x" << hex << *(unsigned int*)&pm << endl;

    pm = &X::b;
    cout << "pointer to member b: "
        << " value = " << pm 
        << ", raw byte value = 0x" << hex << *(unsigned int*)&pm << endl;
}

On Visual Studio 2008 I get:

NULL pointer to member:  value = 0, raw byte value = 0xffffffff
pointer to member a:  value = 1, raw byte value = 0x0
pointer to member b:  value = 1, raw byte value = 0x4

So indeed, this particular compiler is using a special bit pattern to represent a NULL pointer and thus leaving an 0x0 bit pattern as representing a pointer to the first member of an object.

This also means that wherever the compiler generates code to translate such a pointer to an integer or a boolean, it must be taking care to look for that special bit pattern. Thus something like if(pm) or the conversion performed by the << stream operator is actually written by the compiler as a test against the 0xffffffff bit pattern (instead of how we typically like to think of pointer tests being a raw test against address 0x0).

TheUndeadFish