views:

292

answers:

8

If I want to allocate millions of objects of a class Foo, and I want to be memory- and time-efficient, how should I design the Foo class?

Obviously, Foo should not contain much member data.

Also, I guess, it should not use virtual functions?

And how costly is it for Foo to derive from a Base class? And from several Base classes?

Are there any other tips for making millions of Foo objects very efficient?

+4  A: 

Have a look at the Flyweight pattern. The GoF book does a much better job than Wikipedia at explaining the pattern, though.

CesarGon
+8  A: 

I don't think there's much to say about designing your class for millions of allocations. Yes, there's the obvious memory limit, so if you have a fix amount of memory this might be a real concern for you, otherwise you will always risk running out of memory. A pointer to the virtual table is just that, a pointer (4 or 8 bytes on 32 or 64 bit architecture), not sure this is the case in multiple inheritance. Calling virtual functions has the overhead of the virtual lookup (and extra cache miss, if you didn't use it recently), but only for virtual functions, and they may never be inlined.

If there's a lot of repeated values, you might want to look into having a separate data structure as well (flyweight pattern). And for efficiency, make sure you have a light-weight (inlined) constructor and assignment operators, especially if you intend to use stl vectors and similar.

These are all pretty straightforward stuff, so now for my real suggestion:

What's really going to kill your memory management is if you get fragmentation, where you might all of a sudden have a bunch of memory, but still nowhere to put your objects (not enough contiguous space). If you have a lot of interleaved allocation, this might turn to be a real problem so you might want to look into allocating large blocks of objects, keep them in a pool and reuse. Or use a custom allocator (new-operator), where you preallocate a block of memory which is a multiple of your object size, and use that for your objects.

roe
Shakedown
Agreed. Also, if there's any possible way to make them immutable, doing so would allow for reuse. It may be possible to get away with thousands of objects that are referenced in millions of locations. Hard to know without having more info about the problem domain.
kyoryu
+3  A: 

To the extent that an object can be said to be efficient at all, it should be small if a lot are created, and common operations on it should be inlineable if a lot of calls are made. In terms of memory, virtual functions typically cost 4 or 8 bytes (the size of a pointer) per object for the first one, free thereafter. As others have said, Flyweight is one way of making objects smaller, if they contain duplicate data which can be shared. If your millions of objects don't contain duplicate data, forget about it.

More properly it's code which is efficient or inefficient. Virtual calls probably cost some call overhead, but it's up to the calling code, not the class, how many times each member function is called. In any case inlining is where the big speed gains are, and virtual functions are just one way of obstructing a particular call site benefiting from inlining. If your design is simplified by having 27 virtual member functions, each of which is called once a month, but ensuring that the 2 functions that are called millions of times a second can be inlined by the caller, then there's no need to avoid virtual functions.

Base classes cost pretty much the same as a member object of the same type. With multiple inheritance, static_cast might cease to be a no-op as it typically is for single inheritance, but that's probably not worth worrying about. Virtual inheritance and dynamic_cast can both get interesting in terms of the amount of work done at runtime, but only on a scale similar to virtual member functions.

All that said, the main thing you want to do is get some code running as soon as possible that reasonably imitates the object creation and call characteristics your finished code will have. Then you'll know what kind of performance you're looking at - if your critical operation appears more or less fast enough with the virtual functions in, there's no point coming up with some contorted design scheme just to avoid them. If your profiler tells you that all the time is being spent creating objects, not using them once you have them, then you need to be looking at allocation, not member functions.

The reason to get an estimate early is just that you don't want to waste time designing something that definitely won't work. So basic benchmarks ("can I call new a thousand times a second? A million times a second? Just as fast from multiple threads concurrently?") can feed into the design process, but then of course you can't optimise properly until you have a plausible version of your actual code.

Steve Jessop
+4  A: 

The main thing is to minimize use of new and delete by keeping used objects in a pool and re-using them. Don't worry about virtual functions. The overhead of calling them is usually trivial relative to whatever else is going on in your program.

Mike Dunlavey
And the memory cost of virtual functions is minimal - a single pointer per object (to the vftable). You should be able to eliminate this by not having virtual functions.
kyoryu
@kyoryu: Right. And I couldn't tell what the steady-state number of active objects is, so I couldn't tell if size is even an issue.
Mike Dunlavey
+3  A: 

Just to clarify a point on custom allocators.

With the default new, you are likely to get quite a bit of overhead, that is, extra memory allocated on top of the sizeof(Foo). What you really want to happen is to spread this overhead over many Foos.

The idea is to perform one call to new to allocate a single block of bytes big enough to hold 100's or 1000's (or more?) of contiguous Foos, and then allocate single Foos out of that.

If you keep a pool of pre-allocated Foos, you might still be suffering a memory overhead on each instance, even if its quicker.

In The C++pl2e, Stroustrup talks about bytes 'on my machine', so you'll have to do the experiments yourself: How much memory is actually taken up to allocate a single Foo with new?

Look up Pool Allocators (eg Boost), or Small Object Allocators (eg Loki).

quamrana
Good point, but you don't need to look beyond the standard. `std::deque<T>` will already allocate multiple contiguous blocks of T objects. The number of T's per block is optimized by your compiler vendor, who presumably understands your platform.
MSalters
+1  A: 

Regarding the allocation of your class objects, take a look at the boost Pool library, which may be more efficient for many allocations of small objects than regular new/delete via the system allocator. It may also keep them closer together in memory. The fast_pool_allocator is very handy as a C++ allocator implementation that you can easily drop into your app to gauge the benefit. I'd suggest using allocators anyway, as it will make it easier to compare different allocation schemes.

Another thing to consider is when the objects are going to be created -- eg whether you know in advance how many you're going to need (and therefore a pooling/re-use system as described by others would be useful), or whether you just know that there will be O(1m) of them required at different points. Creating large numbers in one go should be a lot faster than allocating lots of them individually. From my own work I've often found that repeated memory allocation for many small objects shows up as a big bottleneck in profiling.

I'd suggest rigging up a test app that will simulate the number of allocations you're needing and benchmark the different strategies. You may need to combine more than one.

the_mandrill
+1  A: 

Others have mentioned the Flyweight pattern, but since you tagged this for C++, I would look into Boost.Flyweight. Its design fits your need and if you need to reinvent the wheel, you can always check out its source for details.

s1n
+1  A: 

For what it's worth... you might also gain from using another idiom.

I know the Flyweight pattern is quite all the rage, but here you could also benefit from not allocating those millions of objects.

If it seems strange, think of the String object in Python. As in many recent languages, String are immutable in Python. Of course, the object you manipulate can change, but the real String does not: your handle is simply relocated.

Of course, Python has automatic garbage collection which makes it much easier, yet it could work for you too. Here is a sketch:

class FooImpl;

class Foo
{
public:
  explicit Foo(int i): mImpl(FooImpl::Build(i)) {}

  int foo() const { return mImpl->foo(); }

  void foo(int i) { mImpl = mImpl->foo(i); }

private:
  const FooImpl* mImpl;
}; // class Foo

class FooImpl
{
public:
  static const FooImpl* Build(int i)
  { 
    typedef std::unordered_set<FooImpl> foos_type;
    FooImpl tmp(i);
    foos_type::iterator it = gFooImplCollection.insert(tmp);
    return &(*it);
  }

  int foo() const { return mFoo; }

  const FooImpl* foo(int i) const
  {
    return Build(i);
  }

  // Useful thingy
  bool operator==(const FooImpl& rhs) const { return mFoo == rhs.mFoo; }
  size_t hash() const { return mFoo; }

private:
  explicit FooImpl(int i): mFoo(i) {}

  int mFoo;
};

std::unordered_set< FooImpl > gFooImplCollection;

Of course, this is very rough, just to give you an idea. If the potential number of different items is important, you need Garbage Collection.

Garbage Collection being another topic, I prefer to leave you with the idea of an Immutable core class (exposes only const methods) and a mutable handle (which simply changes the core class it points to when asked to change).

And now that you've taken the time to read, Boost has it: Boost.Flyweight :)

Note:

It seems important to precise, because Foo is supposed to be allocated (on the stack) millions of times, its size should remain as close as possible to a pointer. This is accomplished by using Intrusive reference couting (I hope that's what Boost does). Also, it is unnecessary to have virtual methods in Foo, the virtual are in FooImpl and the Build may in fact call an AbstractFactory behind the scenes.

Thus, since Foo:

  • does not have any base class
  • does not have any virtual methods
  • only has one attribute (a pointer)

Its effective size will be the size of the pointer... which is the best you can hope for if you don't want to store an id an incur lookup cost at each call :)

Matthieu M.