ansaurus

Question

Managing C++ objects in a buffer, considering the alignment and memory layout assumptions

Answer 1

+3 A:

Non-virtual function calls are linked directly just like a C function. The object (this) pointer is passed as the first argument. No knowledge of the object layout is required to call the function.

Steve Fallows 2009-01-06 17:36:33

Yeah, but if you call a non-virtual object method, you need a "this" pointer. So in @Dynite's case, he want to read in some random bytes, cast them to "FooObj", then call "FooObj::myMethod" and hope that "this" will be a valid object.

Paul Tomblin 2009-01-06 17:44:16

@Paul: Right. Edited to clarify that the pointer will be this. It's up to him to ensure a proper object. As I read his question, the bytes aren't random, but objects previously placed there.

Steve Fallows 2009-01-06 17:51:18

@Steve - I used "random" in a jargony way. I meant that they're bytes that he *hopes* will correspond to a valid class, but with no guarantees. Not even a serialVersionID to make sure the class definition hasn't changed.

Paul Tomblin 2009-01-06 17:55:02

@Paul - Could use a compile time assert to check the size of the class for a rudimentary check

Dynite 2009-01-06 17:58:04

@Dynite, rudimentary is right. You have no check that it's the correct class, or that the class definition hasn't changed since you wrote the bytes to the file, or that the newer version of the compiler hasn't decided to re-arrange the order of things. This is why people use XML etc.

Paul Tomblin 2009-01-06 19:16:26

Answer 2

+1 A:

If the class contains no virtual functions (and therefore class instances have no vptr), and if you make correct assumptions about the way in which the class' member data is laid out in memory, then doing what you're suggesting might work (but might not be portable).
Yes, another way (more idiomatic but not much safer ... you still need to know how the class lays out its data) would be to use the so-called "placement operator new" and a default constructor.

ChrisW 2009-01-06 17:43:26

Answer 3

+3 A:

Basically what you are proposing doing is reading in a bunch of (hopefully not random) bytes, casting them to a known object, and then calling a class method on that object. It might actually work, because those bytes are going to end up in the "this" pointer in that class method. But you're taking a real chance on things not being where the compiled code expects it to be. And unlike Java or C#, there is no real "runtime" to catch these sorts of problems, so at best you'll get a core dump, and at worse you'll get corrupted memory.

It sounds like you want a C++ version of Java's serialization/deserialization. There is probably a library out there to do that.

Paul Tomblin 2009-01-06 17:47:34

Answer 4

+6 A:

You can create a constructor that takes all the members and assigns them, then use placement new.

class Foo
{
    int a;int b;int c;int d;char e;unsigned short int*f;
public:
    Foo(int A,int B,int C,int D,char E,unsigned short int*F) : a(A), b(B), c(C), d(D), e(E), f(F) {}
};

...
char *buf  = new char[sizeof(Foo)];   //pre-allocated buffer
Foo *f = new (buf) Foo(a,b,c,d,e,f);

This has the advantage that even the v-table will be generated correctly. Note, however, if you are using this for serialization, the unsigned short int pointer is not going to point at anything useful when you deserialize it, unless you are very careful to use some sort of method to convert pointers into offsets and then back again.

Individual methods on a this pointer are statically linked and are simply a direct call to the function with this being the first parameter before the explicit parameters.

Member variables are referenced using an offset from the this pointer. If an object is laid out like this:

0: vtable
4: a
8: b
12: c
etc...

a will be accessed by dereferencing this + 4 bytes.

Eclipse 2009-01-06 17:48:41

Answer 5

A:

That depends upon what you mean by "safe". Any time you cast a memory address into a point in this way you are bypassing the type safety features provided by the compiler, and taking the responsibility to yourself. If, as Chris implies, you make an incorrect assumption about the memory layout, or compiler implementation details, then you will get unexpected results and loose portability.

Since you are concerned about the "safety" of this programming style it is likely worth your while to investigate portable and type-safe methods such as pre-existing libraries, or writing a constructor or assignment operator for the purpose.

Richard 2009-01-06 17:51:38

Answer 6

+2 A:

It sounds like you're not storing the objects themselves in a buffer, but rather the data from which they're comprised.

If this data is in memory in the order the fields are defined within your class (with proper padding for the platform) and your type is a POD, then you can memcpy the data from the buffer to a pointer to your type (or possibly cast it, but beware, there are some platform-specific gotchas with casts to pointers of different types).

If your class is not a POD, then the in-memory layout of fields is not guaranteed, and you shouldn't rely on any observed ordering, as it is allowed to change on each recompile.

You can, however, initialize a non-POD with data from a POD.

As far as the addresses where non-virtual functions are located: they are statically linked at compile time to some location within your code segment that is the same for every instance of your type. Note that there is no "runtime" involved. When you write code like this:

class Foo{
   int a;
   int b;

public:
   void DoSomething(int x);
};

void Foo::DoSomething(int x){a = x * 2; b = x + a;}

int main(){
    Foo f;
    f.DoSomething(42);
    return 0;
}

the compiler generates code that does something like this:

function main:
1. allocate 8 bytes on stack for object "f"
2. call default initializer for class "Foo" (does nothing in this case)
3. push argument value 42 onto stack
4. push pointer to object "f" onto stack
5. make call to function Foo_i_DoSomething@4 (actual name is usually more complex)
6. load return value 0 into accumulator register
7. return to caller
function Foo_i_DoSomething@4 (located elsewhere in the code segment)
1. load "x" value from stack (pushed on by caller)
2. multiply by 2
3. load "this" pointer from stack (pushed on by caller)
4. calculate offset of field "a" within a Foo object
5. add calculated offset to this pointer, loaded in step 3
6. store product, calculated in step 2, to offset calculated in step 5
7. load "x" value from stack, again
8. load "this" pointer from stack, again
9. calculate offset of field "a" within a Foo object, again
10. add calculated offset to this pointer, loaded in step 8
11. load "a" value stored at offset,
12. add "a" value, loaded int step 12, to "x" value loaded in step 7
13. load "this" pointer from stack, again
14. calculate offset of field "b" within a Foo object
15. add calculated offset to this pointer, loaded in step 14
16. store sum, calculated in step 13, to offset calculated in step 16
17. return to caller

In other words, it would be more or less the same code as if you had written this (specifics, such as name of DoSomething function and method of passing this pointer are up to the compiler):

class Foo{
    int a;
    int b;

    friend void Foo_DoSomething(Foo *f, int x);
};

void Foo_DoSomething(Foo *f, int x){
    f->a = x * 2;
    f->b = x + f->a;
}

int main(){
    Foo f;
    Foo_DoSomething(&f, 42);
    return 0;
}

P Daddy 2009-01-06 18:03:25

Answer 7

+2 A:

A object having POD type, in this case, is already created (Whether or not you call new. Allocating the required storage already suffices), and you can access the members of it, including calling a function on that object. But that will only work if you precisely know the required alignment of T, and the size of T (the buffer may not be smaller than it), and the alignment of all the members of T. Even for a pod type, the compiler is allowed to put padding bytes between members, if it wants. For a non-POD types, you can have the same luck if your type has no virtual functions or base classes, no user defined constructor (of course) and that applies to the base and all its non-static members too.
For all other types, all bets are off. You have to read values out first with a POD, and then initialize a non-POD type with that data.

Johannes Schaub - litb 2009-01-06 18:51:00

Answer 8

+2 A:

I am storing objects in a buffer. ... If I know the overall size of the object, is it acceptable to create a pointer to this memory and call functions on it?

This is acceptable to the extent that using casts is acceptable:

#include <iostream>

namespace {
    class A {
        int i;
        int j;
    public:
        int value()
        {
            return i + j;
        }
    };
}

int main()
{
    char buffer[] = { 1, 2 };
    std::cout << reinterpret_cast<A*>(buffer)->value() << '\n';
}

Casting an object to something like raw memory and back again is actually pretty common, especially in the C world. If you're using a class hierarchy, though, it would make more sense to use pointer to member functions.

say I have the following class: ...

if I know this class to be of size 24 and I know the address of where it starts in memory ...

This is where things get difficult. The size of an object includes the size of its data members (and any data members from any base classes) plus any padding plus any function pointers or implementation-dependent information, minus anything saved from certain size optimizations (empty base class optimization). If the resulting number is 0 bytes, then the object is required to take at least one byte in memory. These things are a combination of language issues and common requirements that most CPUs have regarding memory accesses. Trying to get things to work properly can be a real pain.

If you just allocate an object and cast to and from raw memory you can ignore these issues. But if you copy an object's internals to a buffer of some sort, then they rear their head pretty quickly. The code above relies on a few general rules about alignment (i.e., I happen to know that class A will have the same alignment restrictions as ints, and thus the array can be safely cast to an A; but I couldn't necessarily guarantee the same if I were casting parts of the array to A's and parts to other classes with other data members).

Oh, and when copying objects you need to make sure you're properly handling pointers.

You may also be interested in things like Google's Protocol Buffers or Facebook's Thrift.

Yes these issues are difficult. And, yes, some programming languages sweep them under the rug. But there's an awful lot of stuff getting swept under the rug:

In Sun's HotSpot JVM, object storage is aligned to the nearest 64-bit boundary. On top of this, every object has a 2-word header in memory. The JVM's word size is usually the platform's native pointer size. (An object consisting of only a 32-bit int and a 64-bit double -- 96 bits of data -- will require) two words for the object header, one word for the int, two words for the double. That's 5 words: 160 bits. Because of the alignment, this object will occupy 192 bits of memory.

This is because Sun is relying on a relatively simple tactic for memory alignment issues (on an imaginary processor, a char may be allowed to exist at any memory location, an int at any location that is divisible by 4, and a double may need to be allocated only on memory locations that are divisible by 32 -- but the most restrictive alignment requirement also satisfies every other alignment requirement, so Sun is aligning everything according to the most restrictive location).

Another tactic for memory alignment can reclaim some of that space.

Max Lybbert 2009-01-06 19:13:11

ansaurus

tags:

views:

answers:

Managing C++ objects in a buffer, considering the alignment and memory layout assumptions

related questions