ansaurus

Question

Minimal latency objects pooling technique in multithread application

Answer 1

+1 A:

If you haven't looked at tcmalloc, you might want to take a look. Basing your implementation off of its concepts might be a good start. Key points:

Determine a set of size classes. (Each allocation will be fulfilled by using an entry from an equal or greater sized allocation.)
Use one size-class per page. (All instances in a page are the same size.)
Use per-thread freelists to avoid atomic operations on every alloc/dealloc
When a per-thread freelist is too large, move some of the instances back to the central freelist. Try to move back allocations from the same page.
When a per-thread freelist is empty, take some from the central freelist. Try to take contiguous entries.
Important: You probably know this, but make sure your design will minimize false sharing.

Additional things you can do that tcmalloc can't:

Try to enable locality of reference by using finer-grained allocation pools. For example, if a few thousand objects will be accessed together, then it is best if they are close together in memory. (To minimize cache missed and TLB faults.) If you allocate these instances from their own threadcache, then they should have fairly good locality.
If you know in advance which instances will be long-lived and which will not, then allocate them from separate thread caches. If you do not know, then periodically copy the old instances using a threadcache for allocation and update old references to the new instances.

0124816 2008-09-19 05:57:31

Answer 2

+2 A:

To minimize construct/destruct latency, you need fully constructed objects at hand, so you will eliminate the new/ctor/dtor/delete time. These "free" objects can be kept in a list so you just pop/push the element at the end.

You may lock the object pools (one for each type) one by one. It is a bit more efficient than a system-wide lock, but does not have the overhead of a by-object locking.

Denes Tarjan 2008-09-19 06:06:02

If ctor and dtor do not allocate or free memory, then they should be fast, and thus there should be no reason to avoid them.Also, the issue with having one freelist per-type is that you must use atomic ops to manipulate it, and it will encourage false sharing.

0124816 2008-09-19 07:38:47

Not sure that this is the best answer. IMHO the one about tcmalloc was better. Cannot vote though

2008-09-19 16:29:30

Answer 3

+3 A:

I assume you have profile and measured your code after doing all that creation and verified that create/destroy is actually causing an issue. Else this is what you should do first.

If you still want to do the object pooling, as a first step, you should ensure your objects are stateless coz, that would be the prerequisite for reusing an object. Similarly you should ensure the members of the object and the object itself has no issue with being used from a different threads other than the one which created it. (COM STA objects / window handles etc)

If you use windows and COM, one way to use system provided pooling would be to write Free Threaded objects and enable object pooling, which will make the COM+ run time (earlier known as MTS) do this for you. If you use some other platform like Java perhaps you could use application servers that define interfaces that your objects should implement and the COM+ server could do the pooling for you.

or you could roll your own code. But you should try to find if there is pattern for this and if yes use that instead of what follows below

If you need to roll your own code, create a dynamically growable collection which tracks the objects already created. Use a vector preferrably for the collection since you would only be adding to the collection and it would be easy to traverse it searching for a free object. (assuming you do not delete objects in pool). Change the collection type according to your delete policies (vector of pointers/references to objects if you are using C++ so that delete and recreate an object at the same location)

Each object should be tracked using a flag which can be read in a volatile manner and changed using an interlock function to mark it as being used/ not used.

If all objects are used, you need to create a new object and add it to the collection. Before adding, you can acquire a lock (critical section), mark the new object as being used and exit the lock.

Measure and proceed - probably if you implemented the above collection as a class you could easily create different collections for different object types so as to reduce lock contention from threads that do different work.

Finally you could implement an overloaded class factory interface that can create all kinds of pooled objects and knows which collection holds which class

You could then optimize on this design from there.

Hope that helps.

2008-09-19 06:22:52

Answer 4

A:

If you have some guess of the preferred size of the pool you can create fixed size pool using stack structure using array (the fastest possible solution). Then you need to implement four phases of object life time hard initialization (and memory allocation), soft initialization, soft cleanup and hard cleanup (and memory release). Now in pseudo code:

Object* ObjectPool::AcquireObject()
{
    Object* object = 0;
    lock( _stackLock );
    if( _stackIndex )
       object = _stack[ --_stackIndex ];
    unlock( _stackLock );
    if( !object )
       object = HardInit();
    SoftInit( object );
}

void ObjectPool::ReleaseObject(Object* object)
{
     SoftCleanup( object );
    lock( _stackLock );
    if( _stackIndex < _maxSize )
    {
       object = _stack[ _stackIndex++ ];
       unlock( _stackLock );
    }
    else
    {
       unlock( _stack );
       HardCleanup( object );
    }
}

HardInit/HardCleanup method performs full object initialization and destruction and they are executed only if the pool is empty or if the freed object cannot fit the pool because it is full. SoftIniti performs soft initialization of objects, it initializes only those aspect of objects that can be changed since it was released. SoftCleanup method free resources used by the object which should be freed as fast as possible or those resources which can become invalid during the time its owner resides in the pool. As you can see locking is minimal, only two lines of code (or only few instructions).

These four methods can be implemented in separate (template) classes so you can implement fine tuned operations per object type or usage. Also you may consider using smart pointers to automatically return object to its pool when it is no longer needed.

Mladen Jankovic 2008-09-19 07:16:47

Issues with this are that you will see a lot of contention for the stack lock. Also, false sharing will exist, as allocations from separate threads will adjacent. In addition, if 3000 temporaries are allocated, and then 1 permanent object is allocated, the 3000 temporaries will not be reclaimbed.

0124816 2008-09-19 07:42:29

I don't get last part about note reclaiming temporary object. Can you be more specific?

Mladen Jankovic 2008-09-19 12:07:44

I misread the code and misunderstood the answer; the 'no reclaim temporary' part was entirely wrong. :-) However, the above could be improved to use a linked list, thus removing the lock. (Instead use atomic list ops.)

0124816 2008-09-20 06:05:18

Answer 5

A:

Have you tried the hoard allocator? It provides better performance than the default allocator on many systems.

Anthony Williams 2008-09-19 08:36:15

Thank you.It was interesting reading. However our guys told me that it has heavy locks contention in compare to umem on Solaris.

2008-09-19 14:46:04

Answer 6

A:

Why do you have multiple threads destroying objects they did not create? It's a simple way to handle object lifetime, but the costs can vary widely depending on use.

Anyways, if you haven't started implementing this yet, at the very least you can put the create/destroy functionality behind an interface so that you can test/change/optimize this at a later date when you have more information about what your system actually does.

MSN

MSN 2008-09-19 16:21:55

The concept is slightly different. An object could travel through threads handling it. And when its life time will has came to the end it could be in any other thread than the mother-thread. Of course after-life management details are hidden..

2008-09-19 16:39:08

ansaurus

tags:

views:

answers:

Minimal latency objects pooling technique in multithread application

related questions