tags:

views:

134

answers:

3

I asked this a while ago on comp.std.c++ and got no reply.

I'm just going to quote my post there with little modification.


Is the last requirement of standard-layout classes, 9/6, necessary or useful?

A footnote explanation is provided:

This ensures that two subobjects that have the same class type and that belong to the same most-derived object are not allocated at the same address (5.10).

Taken alone, the footnote is incorrect. Two empty base classes with a common base class may produce two instances of the base class at the same address.

struct A {};
struct B : A {};
struct C : A {};
struct D : B, C {};

D d;
static_cast<A*>(static_cast<B*>(&d))
   == static_cast<A*>(static_cast<C*>(&d)); // allowed per 1.8/5

Taken in the context of 5.10, subobjects are only mentioned in the comparison requirements of pointers to members. Base subobjects are irrelevant. Moreover, it doesn't make sense to give special status to comparison between a (scalar) pointer to a member subobject and a pointer to a base subobject above that of comparison between pointers to base subobjects.

There wasn't such a restriction in C++03. Even if there is an ABI out there that requires every member to be allocated at a different address from any base of the same type, yet already allows the empty base class optimization on the above code, I think the ABI is buggy and the standard shouldn't capture this.

The language goes back to N2172 which suggests that multiple inheritance might cause trouble and need to be disallowed in standard-layout classes to ensure ABI compatibility; however, that was ultimately allowed and in that light the requirement doesn't make sense.


For reference, 1.8/5-6:

5 Unless it is a bit-field (9.6), a most derived object shall have a non-zero size and shall occupy one or more bytes of storage. Base class subobjects may have zero size. An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage.

6 Unless an object is a bit-field or a base class subobject of zero size, the address of that object is the address of the first byte it occupies. Two distinct objects that are neither bit-fields nor base class subobjects of zero size shall have distinct addresses.

(footnote) Under the “as-if” rule an implementation is allowed to store two objects at the same machine address or not store an object at all if the program cannot observe the difference.

Additional notes:

10.1/8 refers to the same mystery content at 5.10, but it's also just an informative note.

[Note: … A base class subobject may be of zero size (Clause 9); however, two subobjects that have the same class type and that belong to the same most derived object must not be allocated at the same address (5.10). — end note ]

GCC appears to guarantee that empty base subobjects of the same type are given unique addresses. Example program and output. This seems sufficient to guarantee that objects of a given type are uniquely identified by address. That would be above and beyond the guarantees of the C++ object model, §1.8. Of course this is a good idea, but it doesn't seem required by the Standard. Likewise, the platform ABI can extend this guarantee to a class with the first member aliasing an empty base. The language sets minimum requirements for ABIs; an ABI can add a language feature, and other ABIs can follow suit, and the process of catch-up by the Standard is simply error-prone.

My question is whether the given requirement accomplishes anything in the context of the Standard, not whether it is useful to the user in concert with other ABI guarantees. Evidence that such a unique-address guarantee was intended, and only omitted by accident, would also make the requirement more meaningful.


To summarize the answer (or my conclusion, anyway):

The requirement does not theoretically ensure anything, as it's possible anyway to ensure that all objects of a given type have different addresses. When the address of an empty base class subobject conflicts with another object (either another base or a member), the compiler may simply assign it an arbitrary location within the structure. As the standard-layout rules only describe the locations of data members (possibly inherited), the locations of empty bases are still unspecified and perhaps incompatible between similar standard-layout classes. (The locations of non-empty bases are still unspecified as far as I've noticed, and then it's not clear what is meant by "first member" in that case, but they must be consistent in any case.)

In practice, the requirement allows implementations to continue using existing ABIs so long as they include the empty base class optimization. Existing compilers may disable the EBO when the requirement is violated, to avoid the address of the base coinciding with the address of the first member. If the Standard didn't restrict programs this way, libraries and programs would have to be recompiled with updated C++0x compilers… not worth it!

+3  A: 

Two empty base classes with a common base class must produce two instances of the base class at the same address.

I don't think so. In fact a quick check with my copy of g++ indicates that I have two distinct A object addresses. I.e. your code above is not true.

The fact is, we must have two A objects by the way your classes are written. If two objects share the same address they are not two different objects in any meaningful sense. Thus, it is required that distinct addresses exist for the instances of the A object.

Suppose that A is defined like so:

class A
{
   static std::set<A*> instances;
   A() { instances.insert(this); }
   ~A() { instances.remove(this); }
}

If both copies of A are allowed to share an address this code will not function as it was intended. I believe that it is situations like these where the decision is made that we ought to have distinct addresess for different copies of A. Of course, it's the wierdness of situations like this that make me avoid multiple inheritance.

Winston Ewert
Two objects at the same address are different in the sense that both are constructed, possibly with different arguments, and both are destroyed. I can't find where I got that guarantee, but at the least it is allowed for them to be at the same address, as §1.8/5 says "Base class subobjects may have zero size."
Potatoswatter
I don't see that "Base class subobjects may have zero size." implies that two subobjects of the same type are allowed to share an address. It seems to me that the intent is that B and its A subobject can share an address. I think when we are considering two objects of the same type, they must have distinct addresses.
Winston Ewert
@Winston: See FCD quotes in updated question. In particular "Two distinct objects that are neither bit-fields nor base class subobjects of zero size shall have distinct addresses."
Potatoswatter
About the update: Granted, but the object here is to decide what's supported by the language…
Potatoswatter
Your question asked: Is the last requirement of standard-layout classes, 9/6, necessary or useful? hence why I added that.
Winston Ewert
Your question seems to be whether or not this was intended to be required in the earlier version of the standard. I'd guess it was, but really I dunno.
Winston Ewert
Ah, sorry. I don't care about what version of the standard (I consider evidence of intent to imply it will be in a future version), but I mean useful in terms of providing a guarantee to all C++ programs even if the ABI is absolutely minimal. More simply, useful in terms of language standardization. I added the word "useful" as an afterthought to the Usenet text because "necessary" seemed too narrow. +1 anyway.
Potatoswatter
+3  A: 
Steve M
Can you provide a reference that this is required behavior?
Potatoswatter
10.1 [class.mi], paragraph 4: "A base class specifier that does not contain the keyword virtual, specifies a non-virtual base class. A baseclass specifier that contains the keyword virtual, specifies a virtual base class. For each distinct occurrenceof a non-virtual base class in the class lattice of the most derived class, the most derived object (1.8) shallcontain a corresponding distinct base class subobject of that type. For each distinct base class that isspecified virtual, the most derived object shall contain a single base class subobject of that type."
Steve M
@Steve: See quotes in updated question. In particular, empty base subobjects are allowed to have identical addresses: "Two distinct objects that are neither bit-fields nor base class subobjects of zero size shall have distinct addresses."
Potatoswatter
@Potatoswatter: They are allowed to occupy 0 bytes, but are not required to. This is a situation where they must be at different addresses (and therefore take up at least a byte), because non-virtual inheritance results in two distinct subobjects.
Steve M
@Steve: Looks like I was in error about being required to… wasn't as good at the language back then. Anyway, even if the distinct subobjects are allowed to be at the same address, then the requirement in question still doesn't guarantee anything (or anything relevant).
Potatoswatter
@Potatswatter: I've updated my answer to give an example of what is guaranteed.
Steve M
Re update: Still, the standard-layout class `struct X : public C { B b; int i; };` is not equivalent to the C struct `struct X{ C base; B b; int i; };` because the empty base optimization applies either way. If you want the layouts to match, don't use inheritance.
Potatoswatter
@Potatoswatter: Ah, good point.
Steve M
I want to say that the answer is in the paragraph on pointer-to-member comparisons in 5.10, but I'm getting too sleepy to figure it out, so I leave this hint here for a smarter person.
Steve M
+1  A: 
Doug
The relevant point is 9.2/19, "A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member…". This is probably what led me to initially say "… must produce two instances…" However, the placement of base classes is implementation-defined, and empty bases can go anywhere as GCC's workaround demonstrates. (http://ideone.com/Zy4qj) If the initial member shares its type with an empty base, GCC can apply the same workaround as for another indirect base of the same type, and bump the base to a higher address.
Potatoswatter
@Potatoswatter: Yes, I think bumping up the position of the base class would be a suitable workaround, but it would introduce an ABI special case and backward compatibility breakage in basically every implementation out there. It's possible that the committee thought that the extra hassle wasn't worth it, for one special case.
Doug
@Doug: Bumping up the position of the *empty* base class which has no storage. As I noted, GCC already does it that way. (I just updated the ideone link to demonstrate this case too.) How does forbidding something entirely allow more backward compatibility than any other alternative?
Potatoswatter
"However, as other answers have explained, all base class sub-objects and member sub-objects that are part of the same complete object must be distinct, i.e., have different addresses if they are of the same type." — If you can show that this isn't merely a platform-specific ABI extension, I will definitely select your answer as accepted.
Potatoswatter
I think you're on track though. 1) The empty base class optimization is now required, whereas C++03 POD structs couldn't have bases. 2) Guaranteeing unique addresses between the empty base and the first member requires bumping up the base. (This is independent of unique addresses for same type in general.) 3) Bumping up the empty base might not be implemented as universally as "standard" EBO. Some platforms might just disable EBO in that instance, which would violate standard layout. — So you've basically got it… trim off the excess assertions and I'll select this answer.
Potatoswatter
5.9/2: Does not refer to base subobjects. |5.3.1/1: I like that argument. ***Bingo.*** |1.8/2: Agreed 100%… only base subobjects are in question. |10.1/4: I don't like this argument. Leaving the means of distinction open means that the user might not be able to distinguish; the implementation might just call the constructor/destructor repeatedly with the same address. |The problem with "mere intent" and non-normative wording is that it doesn't describe the means to the desired end. This is different from intent described in a DR resolution or working paper.
Potatoswatter
Anyway, the real answer to my intended question is that it's for the sake of ABI compatibility. And, you're right about GCC. I misinterpreted the results of that experiment. There's no way the ABI designers could have guessed that bases should follow the first member in that case. I did verify that the added member bumps the size of the object only if it's the same type as an empty base, but failed to check that the base follows the member.
Potatoswatter