tags:

views:

783

answers:

6

Hi, I'm looking for a way to obtain offsets of data members of a C++ class which is of non-POD nature.

Here's why:

I'd like to store data in HDF5 format, which seems most suited for my kind of material (numerical simulation output), but it is perhaps a rather C-oriented library. I want to use it through the C++ interface, which would require me to declare storage types like so (following documentation from here and here (section 4.3.2.1.1)):

class example { 
public:
    double member_a;
    int member_b;
} //class example

H5::CompType func_that_creates_example_CompType() {
    H5::CompType ct;
    ct.insertMember("a", HOFFSET(example, member_a), H5::PredType::NATIVE_DOUBLE);
    ct.insertMember("b", HOFFSET(example, member_b), H5::PredType::NATIVE_INT);
    return ct;
} //func_that_creates_example_CompType

where HOFFSET is a HDF-specific macro that uses offsetof.

The problem is of course, that as soon as the example-class becomes it little bit more featureful, it is no longer of POD-type, and so using offsetof will give undefined results.

The only workaround I can think of is to first export the data I want to store to a simpler struct, then pass that to HDF. That does however involve data copying, which is exactly what HDF is trying to avoid (and why they have this CompType which enables the library to reach into your objects to save their data to file).

So I was hoping you'd have better ideas. Ideally I'd be looking for a portable workaround for this problem, but if short of that you could give me an idea that works on x86 and x86_64 with GCC I'd already be immensely grateful.

----- appended later: -----

Greg Hewgill suggested below to store the data in a simple struct, then build the actual class by inheriting from that. For HDF specifically, I think that may not practically work. A more elaborate usage scenario than above:

class base_pod {
public:
    double member_a;
    int member_b;
}; //class base_pod

class derived_non_pod : private base_pod {
public:
    //the following method is only virtual to illustrate the problem
    virtual double get_member_a() {return member_a; }
}; //class derived_non_pod

class that_uses_derived_non_pod {
public:
    void whatever();
private:
    derived_non_pod member_c;
}; //class that_uses_derived_non_pod

Now, when we're storing instances of the class that_uses_derived_non_pod, we cannot describe its memory layout as if it had a base_pod as member_c. This would get the offsets wrong because derived_non_pod adds funky stuff (like a virtual function table, I guess?).

+3  A: 

You could declare the POD types in a base class, then extend that class (perhaps with private inheritance) to add your additional functionality.

Update to your update: Because an instance of derived_non_pod can also be treated as a base_pod, therefore the offsets to data members must be the same. With regard to implementation, your compiler will allocate the vtable pointer after the fields of the base_pod when laying out the derived_non_pod structure.

It occurs to me that if you use private inheritance, the compiler may be able to choose to reorder the data fields. It's unlikely to do so, however, and making the inheritance protected or public would avoid this possible trap.

Greg Hewgill
Hi, thanks! This is absolutely the cleanest solution for the example I opened with, and I hadn't thought of that. Perhaps it doesn't scale for the case where you have CompType ("composite type") members inside other CompTypes (I'll append to the question).
yungchin
Thanks for coming back and adding the explanation-update, I get it now. Awesome.
yungchin
+1  A: 

The problem is as soon as your struct/class is not extern "C", C++ compilers are free to rearrange and optimize the layout of your struct/class, so you might end up with an reordered struct depending on your compiler.

There are preprocessor (e.g. #pragma pack) flags for C like behaviour, but they are not portable in most cases.

Fionn
Just a minor thing: the rearranging applies to all C structs.The way the member variables in a struct in C (or C++) are arranged is totally dependant on the compiler implementation.
Dan Cristoloveanu
There are some restrictions. For example 9.2(12) says that members declared without an intervening access-specifier appear in memory in the order declared (perhaps with intervening bytes). So re-ordering can be prevented. "Totally compiler-dependent" is a good rule of thumb, though.
Steve Jessop
This is interesting - while the compiler is not completely free to reorder members (as onebyone noted), I was quite surprised that reordering of data members was allowed at all. I learned something new today.
Michael Burr
+4  A: 

Greg Hewgill's solution is probably preferable to this (maybe with composition rather than inheritance).

However, I think that with GCC on x86 and x86_64, offsetof will actually work even for members of non-POD types, as long as it "makes sense". So for example it won't work for members inherited from virtual base classes, because in GCC that's implemented with an extra indirection. But as long as you stick to plain public single inheritance, GCC just so happens to lay out your objects in a way which means every member is accessible at an offset from the object pointer, so the offsetof implementation will give the right answer.

Trouble with this of course is that you have to ignore the warnings, which means if you do something that doesn't work, you'll dereference a close-to-null pointer. On the plus side, the cause of the problem will probably be obvious at runtime. On the minus side, eeew.

[Edit: I've just tested this on gcc 3.4.4, and actually the warning is upgraded to an error when getting the offset of a member inherited from a virtual base class. Which is nice. I'd still be slightly worried that a future version of gcc (4, even, which I don't have to hand) will be more strict, and that if you take this approach your code may in future stop compiling.]

Steve Jessop
Thanks! I work with gcc 4.2.3 and as far as I can tell it is about as strict; I get lots of warnings, no errors so far. It works for the simpler classes but I have yet to get my head around ones with ABCs (so the class to store is ABC-derived, not the members). Will update when/if it works.
yungchin
+2  A: 

Depending on how portable you want to be, you can use offsetof() even on non-POD types. It's not strictly conformant but in the way offsetof() is implemented on gcc and MSVC, it'll work with non-POD types in the current version and the recent past.

Roel
A: 

Would using a pointer to member work instead of offsetof()? I know that you'd probably have to do all sorts of casting to be able to actually use the pointer since I'm guessing that InsertMember is acting on the type specified in the last parameter at runtime.

But with your current solution you're already going around the type system so I'm not sure that your losing anything there. Except that the syntax for pointers to member is hideous.

Michael Burr
Hi, thanks! I think it would not work in this case, because CompType::insertMember() doesn't take an instance of the class, it just takes the byte offset of the member.The type in the last parameter describes the layout of the new member; it could also be another CompType (for composition).
yungchin
+1  A: 

I'm pretty sure that Roel's answer along with consideration for onebyone's answer covers most of what you ask.

struct A
{
  int i;
};

class B: public A
{
public:
  virtual void foo ()
  {
  }
};

int main ()
{
  std::cout << offsetof (B, A::i) << std::endl;
}

With g++, the above outputs 4, which is what you'd expect if B has a vtable before the base class member 'i'.

It should be possible though to calculate the offset manually, even for the case where there are virtual bases:

struct A1 {
  int i;
};

struct A2 {
  int j;
};

struct A3 : public virtual A2 {
};

class B: public A1, public A3 {
public:
  virtual void foo () {
  }
};

template <typename MostDerived, typename C, typename M>
ptrdiff_t calcOffset (M C::* member)
{
  MostDerived d;
  return reinterpret_cast<char*> (&(d.*member)) - reinterpret_cast<char*> (&d);
}

int main ()
{
  B b;
  std::cout << calcOffset<B> (&A2::j) << ", " 
            << calcOffset<B> (&A1::i) << std::endl;
}

With g++, this program outputs 4 and 8. Again this is consistent with the vtable as the first member of B followed by the virtual base A2 and its member 'j'. Finally the non virtual base A1 and its member 'i'.

The key point is that you always calculate the offsets based on the most derived object, ie. B. If the members are private then you may need to add a "getMyOffset" call for each member. This call will perform the calculation where the name is accessible.

You might find the following useful too. I think it's nice to associate all of this with the object that you're building the HDF type for:

struct H5MemberDef
{
  const char * member_name;
  ptrdiff_t offset;
  H5PredType h5_type;
};


class B  // ....
{
public:

  // ...

  static H5memberDef memberDef[];
};

H5MemberDef B::memberDef[] = {
  { "i", calcOffset<B> (&A1::i), H5::PredType::NATIVE_INT }
  , { "j", calcOffset<B> (&A2::j), H5::PredType::NATIVE_INT }
  , { 0, 0, H5::PredType::NATIVE_INT }
};

And then you can build the H5type via a loop:

H5::CompType func_that_creates_example_CompType(H5MemberDef * pDef) {
  H5::CompType ct;
  while (*pDef->member_name != 0)
  {
    ct.insertMember(pDef->member_name, pDef->offset, pDef->h5_type);
    ++pDef;
  }
  return ct;
}

Now if you add a member to B or one of its bases, then a simple addition to this table will result in the correct HDF type being generated.

Richard Corden