ansaurus

Question

In game programming, what are the specific C++ or STL features that causes performance hogs ?

Answer 1

+9 A:

Using STL tends to generate as good if not more efficient code than hand written code for many cases.

Use a profiler to see where you have problems.

Even where C++ STL might perform worse its code is likely to be less error prone. So only write code if the profiler shows there is an issue

Mark 2010-06-28 16:37:26

If programmer don't know what he/she is doing, STL won't make situation better.

SigTerm 2010-06-28 16:44:27

@SigTerm That is true of all programming however leaning STL saves you from writing many algorithms and the much longer time spent in debugging.

Mark 2010-06-28 16:55:24

The main performance trap when using the standard containers is memory allocation. Make sure you understand when the container allocates and frees memory, how to avoid doing it when you don't need to, and how to use custom allocators when you really need fast, dynamic behaviour. The standard algorithms, as Mark says, should be about as fast as can be, as well as generic and (if you're using a popular implementation) well tested.

Mike Seymour 2010-06-28 16:57:49

Nathan Ernst 2010-06-29 01:19:06

Another issue regarding STL is that some of the implementations are different dependant on platform/compiler. std::resize comes to mind which on some implementations (re)allocates memory if truncating the size, and some implementations don't.

Simon 2010-07-30 08:35:17

Answer 2

+2 A:

Actually, you can use classes everywhere, and still get as good of performance as C (and often better performance than typical C).

Most STL is designed to do most of its "tricky" parts at compile time, so the run-time performance is excellent. The main thing to look out for (especially if you write for things like game consoles or mobile phones, that have less capable graphics hardware) is structuring your data to work well with a cache.

Jerry Coffin 2010-06-28 16:39:08

Answer 3

+5 A:

Unless you're building the entire engine from scratch, you're not going to notice a difference in using or not using C++ classes or STL. In general, most of the time the CPU is going to be running code that's not even written by you anyway. Plus the overhead imposed by any scripting languages you implement (Lua, Python, etc) will eclipse any small performance penalty you may incur by using C++/STL.

In short, don't worry about it. Writing good OOP code is better than trying to write super-fast code from the get-go. "Premature optimization is the root of much programming evil."

Timothy Baldridge 2010-06-28 16:39:19

+1 for the optimization comment

obelix 2010-06-28 16:46:23

And the rest of the quote is: "Yet we should not pass up our opportunities in that critical 3%.A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified."

NoMoreZealots 2010-06-28 17:17:33

Right....so replacing or not using STL is the correct step? See the rest of my comment. I'm not saying he shouldn't optimize. I'm saying that in the end, the time spent in STL code is going to be dwarfed by the time spent in the rendering engine, physics code and math libraries. And if he's trying to write any of those from scratch, he's going to have bigger issues anyways.

Timothy Baldridge 2010-06-28 17:20:30

And lets not forget Wirth's law: "Software is getting slower more rapidly than hardware becomes faster."

NoMoreZealots 2010-06-28 17:28:46

He's looking for known bottlenecks in STL, not for the most common advice that is given on this site related to optimization in general. Which is typically, don't bother. I write embedded DSP code, no compiler, no library gets you all the way. Most people never take into consideration that if you have 100,000 lines of code that mere 3% is 3000 lines.

NoMoreZealots 2010-06-28 17:40:36

Answer 4

A:

Stroustrup talks about the design and performance of the STL in general, and specifically about the different performance characteristics of the various different container types, in his book The C++ Programming Language (3rd edition).

ChrisW 2010-06-28 16:39:20

Answer 5

+7 A:

1) Debug builds. Severaly slow down many stl containers due to excessive error checking. At least on Microsoft compilers.
2) Excessive dynamic memory allocation. If you have routine that contains std::vector within it AND if you'll call it few thousand times per frame, it will be very slow and bottleneck will be somewhere within operator new or another memory allocation routine. If you'll turn this vector into some kind of static buffer (so you won't have to recreate it every time), it will be much faster. Memory allocation is slow. If you have a buffer, it is normally better to reuse it instead of making a new one during each call.
3) Using inefficient algorithms. For example, using linear search instead of binary search, using wrong sort algorithms (for example, quick sort, heap sort are faster than bubble sort on unsorted data but insertion sort can be faster than quicksort on partially sorted data). Searching instead of using std::map, and so on.
4) Using wrong kind of container. For example, std::vector isn't suitable for inserting elements at random places. std::deque, while is comparable with std::vector (random access), allows fast push_front and push_back, can be 10 times slower than std::vector if you subtract two iterators (on MSVC, again).

SigTerm 2010-06-28 16:43:40

@SigTerm: debugging iterators (essentially) is what slows down debug builds. This requires maintaining extra state, which costs much more than a few range checking here and there.

Matthieu M. 2010-06-28 17:39:29

@Matthieu M: std::deque example in #4 (10x slowdown) happened on release build. Also, I *think* that there are other sanity checks, not only in iterators, but in [] operators, etc.

SigTerm 2010-06-28 17:46:28

There are. But from what I've seen iterators are the most costly. The main issue is that the container should maintain a list of iterators (observer pattern) and check at each update which are invalidated. Often times it's a linked list and it takes O(N) at each update (where N is the number of iterators). Add to the fact that a linked list is not known for its cache-friendliness, and you get yourself a major slowdown.

Matthieu M. 2010-06-29 06:10:42

It's hardly a surprise that `std::deque::iterator::operator-` is 10 times slower than a vector's. The latter is typically a mere pointer operation.

MSalters 2010-06-29 07:50:59

Answer 6

+2 A:

Here are, in my opinion, the key points to writing performant C++/STL code:

Learn what memory allocation strategies are for each STL container,
Learn what algorithms work best with what iterator categories,
Learn run-time polymorphism vs compile-time polymorphism.

Good starting points are:

Nikolai N Fetissov 2010-06-28 16:49:33

Answer 7

A:

I don't have experience in gaming, but Electronic Arts developed their own (non-conforming) implementation of the STL. There is an extensive article explaining the motives and design of the library here.

Note that in many cases, you will be better off by using the STL that comes with your implementation, then measure, then measure again and make sure that you understand what is going on and what is really a performance problem. Then, only then, and if the problem is within the STL (and not in how the STL is used), I would use unstandard libraries.

David Rodríguez - dribeas 2010-06-28 16:53:19

Answer 8

+1 A:

I recommend Effective STL by Scott Myers. The biggest performance hog of STL, is the lack of understanding of it. Please learn it well!

Also see Optimizing software in C++ by Agner Fog for C++ specific performance related topics.

5ound 2010-06-28 16:58:12

Answer 9

A:

These rarely become performance hogs if used correctly. A profiler should always be your primary means of finding bottlenecks in your code short of obvious algorithmic inefficiencies (in which case it's still good practice to use a profiler to make sure if you are on a tight deadline).

There are some legitimate efficiency concerns, however, if you do come across STL usage to show up as a profiler hotspot.

vector<ExpensiveElement> v;
// insert a lot of elements to v
v.push_back(ExpensiveElement(...) );

This push_back immediately above has the worst case scenario of having to linearly copy all the ExpensiveElements inserted so far (if we've exceeded the current capacity). In the best case scenario, we still have to copy ExpensiveElement one time unnecessarily.

We can mitigate the issue by making vector store shared_ptr, but now we pay for two additional heap allocations per ExpensiveElement inserted (one for the reference counter in boost::shared_ptr, and one for ExpensiveElement) along with the overhead of a pointer indirection each time we want to access ExpensiveElement stored in the vector.

To mitigate the memory allocation/deallocation overhead (generally more likely to be a hotspot than an additional level of indirection), we can implement a fast memory allocator for ExpensiveElement. Nevertheless, imagine if std::vector provided an alloc_back method:

new (v.alloc_back()) ExpensiveElement(...);

This would avoid any copy ctor overhead, but is unsafe and prone to abuse. Nevertheless, that's exactly what I did with our vector-clone in response to hotspots. Note that I work in raytracing which is a field where performance is often one of the highest measures of quality (other than correctness) and we profile our code daily so it's not like we just decided out of the blue that vector wasn't efficient enough for our purposes.

We also had no choice but to implement a vector clone because we provide a software development kit where other people's std::vector implementations may be incompatible with our own. I don't want to give you the wrong idea: explore these kinds of solutions only if your profiler sessions really call for it!

Another common source of inefficiency is when using linked STL containers like set, multiset, map, multimap, and list. However, that's not necessarily their fault, but rather the fault of the default std::allocator being used. These perform a separate memory allocation/deallocation per node so the default allocator can be pretty slow for these purposes, especially across multiple threads (thread contention, better off with per-thread memory pools). You can really get a speed boost by writing your own memory allocator (though this is not a trivial thing to do and don't forget alignment if you do).

I can't emphasize enough that these kinds of optimizations should only be applied in response to the profiler. You'll make your code harder to use and maintain this way, so you should be doing it only in exchange for a solid, demonstrable boost in your application's performance.

2010-06-28 16:58:56

"This push_back immediately above has the worst case scenario" std::vector::push_back can be perfectly acceptable. I recently had a code which used std::deque (push_back, random access), but it was slower than std::vector (even with push_back) because iterator subtraction was slower for std::deque, and performance difference made was larger than performance difference for push_backs. So using proifiler is the best idea.

SigTerm 2010-06-28 17:09:01

Answer 10

A:

This book covers what issues you face when using C++ in games.

zooropa 2010-06-28 17:14:13

Answer 11

+1 A:

The other answers are all accurate: the problems with STL and game programming are mostly with misuse.

My general approach is the following: 1. Write it with STL and carefully choose the appropriate algorithm, container, etc. 2. Profile for bottlenecks. 3. If it's STL causing the problem, replace it.

Optimizing too early can really slow you down and only cause more problems later.

Of course, it depends on platform as well. Sometimes, you have to write all of your own stuff because you simply can't afford the extra CPU/RAM overhead of STL.

CrypticPrime 2010-06-28 19:44:50

ansaurus

tags:

views:

answers:

In game programming, what are the specific C++ or STL features that causes performance hogs ?

related questions