Shared pointers and the performance

views:

347

answers:

+2 Q:

Shared pointers and the performance

Hello.

I have been using shared pointers for soem time now, and I have performance issues in my program... So I'd like to know if shared pointers lead to performance decrease. If so, then how hard? Thanks alot.

My program is multi-threaded, using std::tr1::shared_ptr

+3 A:

Shared pointers are reference counted. Particularly when you're using multi-threading, incrementing and decrementing the reference count can take a significant amount of time. The reason multithreading hurts here is that if you passed a shared pointer between threads, the reference count would end up shared between those threads, so any manipulation has to be synchronized between the threads. That can slow things down quite a bit.

Edit: For those who care about how much slower thread interlocking can make some fairly simple operations, see Herb Sutter's testing with a few implementations of CoW Strings. While his testing is far from perfect (e.g. he tested only on Windows), it still gives some idea about the kind of slow-down you can expect. For most practical purposes, you can/could think of a CoW string as something like a shared_ptr<charT>, with a lot of (irrelevant) member functions added.

Jerry Coffin 2009-10-12 18:01:59

This is only true if you use a reference counting mechanism that is thread aware. It does not appear that this is the case according to the documentation http://www.boost.org/doc/libs/1_38_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety

JaredPar 2009-10-12 18:03:39

He doesn't say what shared_ptr he's using. Some are already thread aware, and some aren't. It's open to question whether the one he's using is or not. That said, you were certainly right that the chances of this being the cause of a performance problem are fairly remote.

Jerry Coffin 2009-10-12 18:06:23

Added more information

Guest 2009-10-12 18:11:56

Significant amount of time to increment/decrement reference count? Seriously? On x86/x64 if it takes "significant" amount of time to increment/decrement reference count it simply means you are not doing anything else in your program. Interlocked operations might be "significantly" slower than their single-threaded alternatives, but they are still ultra-fast compared to what normal programs do (i.e. read some data from files, any non-trivial computation, etc)

sbk 2009-10-12 18:17:06

This is a speculative guess with no hard information. The type we all make just before profiling to test our guess. Unfortunately it the real world it does not hold water.

Martin York 2009-10-12 18:25:59

It's certainly true that I haven't profiled his program, and attributing a performance problem to shared_ptr without doing so is speculation. That, however, was not the question he asked. What he asked was whether shared_ptr's cause a slow-down, and if so how much. The answer is "yes they can, though exactly how much is almost impossible to predict." I agree that they're probably not the source of his problem, but that's not what he asked.

Jerry Coffin 2009-10-12 18:43:54

If people would just measure raw pointer performance vs shared_ptr they wouldn't be criticial of your post so much.. But hey, it's like same old ignorance again.. shared_ptr-s do slow down for the price you pay in maintainability, and quite heavily.

rama-jka toti 2009-10-12 19:41:57

for boost::shared_ptr, on CPU's that support this, reference counting is done with atomic operations which don't need synchronization

stefaanv 2009-10-12 20:21:52

We had this discussion at work a few weeks ago - how expensive it is to pass a boost shared_ptr by value rather than by reference. Turns out is it can be over 200 times slower due to the lock operations, plus the code bloat for dealing with calling the d'tor should an exception be raised. Of course, passing by reference is just a couple of instructions. BTW the platform is Windows XP with VisualStudio 2005.

Stephen Nutt 2009-10-13 01:12:48

@stefaanv: This isn't really "don't need synchronization", but that have sufficient synchronization supported directly by the hardware. Direct support by the hardware definitely helps -- but it's still a lot slower than something that doesn't need synchronization at all.

Jerry Coffin 2009-10-13 01:56:13

Well my program is a game server, running profiler is way too slow, or I will have to test it out of live server...

Guest 2009-10-13 14:04:41

+7 A:

It's virtually impossible to correctly answer this question given the data. The only way to truly tell what is causing a performance issue in your application is to run a profiler on the program and examine the output.

That being said, it's very unlikely that a shared_ptr is causing the slow down. The shared_ptr type and many early home grown variants are used in an ever increasing number of C++ programs. I myself use them in my work (professional at home). I've spent a lot of time profiling my work applications and shared_ptr hasn't ever been even close to a problem in my code or any other code running within the application. It's much more likely that the error is elsewhere.

JaredPar 2009-10-12 18:02:12

+3 A:

Very unlikely - you'd have to spend most of your time passing pointers around.

The effect of shared ptrs is usually minor, and it's hard to even construct an edge case where they become a problem (assuming a proper implementation and a properly optimizing compiler).

Impacts of shared ptr:

increased allocation size.
That would matter only if you have many shared pointers to very small objects (say, tens of millions of shared_ptr<int>) and/or are working close to a memory limit. There's a small potential for a notable performance decrease if the extra allocations exceed a cache/NUMA level within an inner loop
increased numer of allocations
shared_ptr allcoates a tracking object (reference count, weak count and deleter). This puts pressure on the heap and may cause a general slowdown if you have a high total number of allocations and deallocations.
Can be avoided by using make_shared, putting referent and trackng object into a single allocation
reference counting
increases the cost of a copy of a pointer. In a single threaded application, you'd notice that only of you spendmost of your time copying pointers anyway. In a multithreaded application, you'd still need have high contention on the same pointers.
Cost of copy can be avoiided in many places by passing a shared_ptr<T> const & e.g. as function argument.
dereferencing
The additional dereferencing cost is zero in a release build of an optimizing compiler. Debug builds often equate to function calls and additional NULL checks. Still, especially in debug builds you'd have to spend most of the time dereferencing pointers for it to make a difference.

Without additional informaiton, we can't help you. You need to describe what the "performance issues" are (general sluggishness, certain operations taking a long time, lots of swapping), and some key figures - what your app doe, how many smart pointers are there, how often they get copied, and what other operations you run besindse juggling smart pointers.

Or you learn to use performance monitor and/or a profiler to figure out what causes the slowdowns and if there are particular bottlenecks.

peterchen 2009-10-12 18:36:27

+2 A:

If your program appears to have a performance problem, it is perfectly natural to start guessing what the problem could be, but if you want to place a bet, it is almost 100% likely to be something else entirely. Profiling may find the problem. This is the method I use.

Mike Dunlavey 2009-10-12 20:13:33

One thing that might hurt performance is excessive passing shared_ptr as function parameters. A solution for that would be passing references to shared_ptr. However, this is micro-optimisation, so only do it when really needed

edit: When thinking about this, there are better ways to optimise:

When excessive passing the pointer, you should probably let the object do something instead of dragging it around.
You can pass (const) reference to the object instead of the pointer
pass a reference to the pointer when the pointer needs to be changed

stefaanv 2009-10-12 20:25:39

Don't guess about performance: Profile your code.

T.E.D. 2009-10-12 21:05:25

+2 A:

If your app is passing around 700 byte XML messages that could be contained in 65 byte Google protocol messages or 85 byte ASN.1 messages then it probably is not going to matter. But if it is processing a million somethings a second then I would not dismiss the cost of adding 2 full read modify write (RMW) cycles to the passing of a pointer.

A full read modify write is on the order of 50 ns so two is 100 ns. This cost is the cost of a lock-inc and a lock-dec - the same as 2 CAS's. This is half of a windows critical section reserve and release. This is compared to a single one machine cycle push (400 PICO seconds on a 2.5GHZ machine)

And this does not even include the other costs for invalidating the cache line that actually contains the count, the effects of the BUS lock on other processors, etc etc.

Passing smart pointers by const reference is almost ALWAYS to be preferred. If the callee does not make a new shared pointer when he wants to-guarantee or control-of the lifetime of the pointee then it is a bug in the callee. To go willy-nilly passing thread safe reference counting smart pointers around by value is just asking for performance hits.

The use of reference counted pointers simplifies lifetimes no doubt, but to pass shared pointers by value to try to protect against defects in the callee is sheer and utter nonsense.

Excessive use of reference counting can in short order turn a svelte program that can process 1mm messages per second (mps) into a fat one that handles 150k mps on the same hardware. All of a sudden you need half a rack of servers and $10000/year in electricity.

You are always better off if you can manage the lifetimes of your objects without reference counting.

An example of a simple improvement is say if you are going to fanout an object and you know the breadth of the fanout(say n) increment by n rather that individually increment at each fanout.

BTW when the cpu sees a lock prefix, it really does say "Oh no this is going to hurt".

All that being said, I agree with everyone that you should verify the hot spot.

pgast 2009-10-13 05:02:01

ansaurus

tags:

views:

answers:

Shared pointers and the performance

related questions