views:

932

answers:

9

I'm trying to understand what kind of memory hit I'll incur by creating a large array of objects. I know that each object - when created - will be given space in the HEAP for member variables, and I think that all the code for every function that belongs to that type of object exists in the code segment in memory - permanently.

Is that right?

So if I create 100 objects in C++, I can estimate that I will need space for all the member variables that object owns multiplied by 100 (possible alignment issues here), and then I need space in the code segment for a single copy of the code for each member function for that type of object( not 100 copies of the code ).

Do virtual functions, polymorphism, inheritance factor into this somehow?

What about objects from dynamically linked libraries? I assume dlls get their own stack, heap, code and data segments.

Simple example (may not be syntactically correct):

// parent class
class Bar
{
public:
    Bar()  {};
    ~Bar() {};

    // pure virtual function
    virtual void doSomething() = 0;

protected:
    // a protected variable
    int mProtectedVar;
}

// our object class that we'll create multiple instances of
class Foo : public Bar
{
public:
    Foo()  {};
    ~Foo() {};

    // implement pure virtual function
    void doSomething()          { mPrivate = 0; }

    // a couple public functions
    int getPrivateVar()         { return mPrivate; }
    void setPrivateVar(int v)   { mPrivate = v; }

    // a couple public variables
    int mPublicVar;
    char mPublicVar2;

private:
    // a couple private variables
    int mPrivate;
    char mPrivateVar2;        
}

About how much memory should 100 dynamically allocated objects of type Foo take including room for the code and all variables?

+9  A: 

It's not necessarily true that "each object - when created - will be given space in the HEAP for member variables". Each object you create will take some nonzero space somewhere for its member variables, but where is up to how you allocate the object itself. If the object has automatic (stack) allocation, so too will its data members. If the object is allocated on the free store (heap), so too will be its data members. After all, what is the allocation of an object other than that of its data members?

If a stack-allocated object contains a pointer or other type which is then used to allocate on the heap, that allocation will occur on the heap regardless of where the object itself was created.

For objects with virtual functions, each will have a vtable pointer allocated as if it were an explicitly-declared data member within the class.

As for member functions, the code for those is likely no different from free-function code in terms of where it goes in the executable image. After all, a member function is basically a free function with an implicit "this" pointer as its first argument.

Inheritance doesn't change much of anything.

I'm not sure what you mean about DLLs getting their own stack. A DLL is not a program, and should have no need for a stack (or heap), as objects it allocates are always allocated in the context of a program which has its own stack and heap. That there would be code (text) and data segments in a DLL does make sense, though I am not expert in the implementation of such things on Windows (which I assume you're using given your terminology).

John Zwinck
Your assumption "If the object has automatic (stack) allocation, so too will its data members" is not correct. Take a std::vector which is declared as a stack variable. If you insert new values into the vector, it will allocate them on the heap - not on the stack
jn_
That is why I also said in my answer "If a stack-allocated object contains a pointer or other type which is then used to allocate on the heap, that allocation will occur on the heap regardless of where the object itself was created." std::vector is commonly implemented as containing three pointers.
John Zwinck
you are right, I misunderstood your post
jn_
THANK YOU for a great answer. Your insight has been very helpful. This site is the best! The only extra thing I wish I could add to your answer would be the actual sizes (math) that some of the other answers include.
A: 

Although some aspects of this are compiler vendor dependent. All compiled code goes into a section of memory on most systems called 'text'. this is separate from both the heap and stack sections ( a fourth section, 'data', holds most constants). Instantiating many instances of a class incurs run-time space only for its instance variables, not for any of its functions. If you make use of virtual methods, you will get an additional, but small, bit of memory set aside for the virtual look-up table (or equivalent for compilers that use some other concept), but its size is determined by the number of virtual methods times the number of virtual classes, and is independent of the number of instances at run-time

This is true of statically and dynamically linked code. The actual code all lives in a 'text' region. Most operating systems actually can share dll code across multiple applications, so if multiple applications are using the same dll's, only one copy resides in memory and both applications can use it. Obviously there is no additional savings from shared memory if only one application uses the linked code.

TokenMacGuy
Note that "virtual inheritance" has a specific meaning which is different from what you probably meant ("virtual methods").
John Zwinck
Hmm... you're right... edited!
TokenMacGuy
+1  A: 

You can't completely accurately say how much memory a class or X objects will take up in RAM.

However to answer your questions, you are correct that code exists only in one place, it is never "allocated". The code is therefore per-class, and exists whether you create objects or not. The size of the code is determined by your compiler, and even then compilers can often be told to optimize code size, leading to differing results.

Virtual functions are no different, save the (small) added overhead of a virtual method table, which is usually per-class.

Regarding DLLs and other libraries... the rules are no different depending on where the code has come from, so this is not a factor in memory usage.

MattJ
+1  A: 

Code exists in the text segment, and how much code is generated based on classes is reasonably complex. A boring class with no virtual inheritance ostensibly has some code for each member function (including those that are implicitly created when omitted, such as copy constructors) just once in the text segment. The size of any class instance is, as you've stated, generally the sum size of the member variables.

Then, it gets somewhat complex. A few of the issues are...

  • The compiler can, if it wants or is instructed, inline code. So even though it might be a simple function, if it's used in many places and chosen for inlining, a lot of code can be generated (spread all over the program code).
  • Virtual inheritance increases the size of polymorphic each member. The VTABLE (virtual table) hides along with each instance of a class using a virtual method, containing information for runtime dispatch. This table can grow quite large, if you have many virtual functions, or multiple (virtual) inheritance. Clarification: The VTABLE is per class, but pointers to the VTABLE are stored in each instance (depending on the ancestral type structure of the object).
  • Templates can cause code bloat. Every use of a templated class with a new set of template parameters can generate brand new code for each member. Modern compilers try and collapse this as much as possible, but it's hard.
  • Structure alignment/padding can cause simple class instances to be larger than you expect, as the compiler pads the structure for the target architecture.

When programming, use the sizeof operator to determine object size - never hard code. Use the rough metric of "Sum of member variable size + some VTABLE (if it exists)" when estimating how expensive large groups of instances will be, and don't worry overly about the size of the code. Optimise later, and if any of the non-obvious issues come back to mean something, I'll be rather surprised.

Adam Wright
Adam: You should mention that the VTable size overhead is incurred on a per *type* basis, not a per *object* basis, thoough the class does need to point to it (so, yes, therefore it does need to store a pointer for each class, but that is it).
Arafangion
True, I'll clarify this. But multiple inheritance and virtual inheritance can incur more than one VTABLE pointer per instance.
Adam Wright
Thanks for adding detail to the answers for this question! Very useful response and nicely organized.
A: 

Your estimate is accurate in the base case you've presented. Each object also has a vtable with pointers for each virtual function, so expect an extra pointer's worth of memory for each virtual function.

Member variables (and virtual functions) from any base classes are also part of the class, so include them.

Just as in c you can use the sizeof(classname/datatype) operator to get the size in bytes of a class.

Dan O
A: 

Yes, that's right, code isn't duplicated when an object instance is created. As far as virtual functions go, the proper function call is determined using the vtable, but that doesn't affect object creation per se.

DLLs (shared/dynamic libraries in general) are memory-mapped into the process' memory space. Every modification is carried on as Copy-On-Write (COW): a single DLL is loaded only once into memory and for every write into a mutable space a copy of that space is created (generally page-sized).

Eduard - Gabriel Munteanu
+1  A: 

if compiled as 32 bit. then sizeof(Bar) should yield 4. Foo should add 10 bytes (2 ints + 2 chars).

Since Foo is inherited from Bar. That is at least 4 + 10 bytes = 14 bytes.

GCC has attributes for packing the structs so there is no padding. In this case 100 entries would take up 1400 bytes + a tiny overhead for aligning the allocation + some overhead of for memory management.

If no packed attribute is specified it depends on the compilers alignment.

But this doesn't consider how much memory vtable takes up and size of the compiled code.

neoneye
Thank you for doing the math! Very useful gcc perspective too.
A: 

It's very difficult to give an exact answer to yoour question, as this is implementtaion dependant, but approximate values for a 32-bit implementation might be:

int Bar::mProtectedVar;    // 4 bytes
int Foo::mPublicVar;        // 4 bytes
char Foo::mPublicVar2;     // 1 byte

There are allgnment issues here and the final total may well be 12 bytes. You will also have a vptr - say anoter 4 bytes. So the total size for the data is around 16 bytes per instance. It's impossible to say how much space the code will take up, but you are correct in thinking there is only one copy of the code shared between all instances.

When you ask

I assume dlls get their own stack, heap, code and data segments.

Th answer is that there really isn't much difference between data in a DLL and data in an app - basically they share everything between them, This has to be so when you think about about it - if they had different stacks (for example) how could function calls work?

anon
Thank you for showing the memory amounts (on a typical PC) and for clarifying the DLL related issue.
A: 

The information given above is of great help and gave me some insight in C++ memory structure. But I would like to add here is that no matter how many virtual functions in a class, there will always be only 1 VPTR and 1 VTABLE per class. After all the VPTR points to the VTABLE, so there is no need for more than one VPTR in case of multiple virtual functions.

Ayush