When coding, what is a good rule of thumb to keep in mind with respect to performance? There are endless ways to optimize for a specific platform and compiler, but I'm looking for answers that apply equally well (or almost) across compilers and platforms.

+11  A: 

The number #1 performance tip is to profile your code early and often. There are a lot of general "don't do this" tips but it's really hard to guarantee this will impact the performance of your application. Why? Every application is different. It's easy to say that passing a vector by value is bad if you have a lot of elements but does your program even use a vector (you probably should but ...)?

Profiling is the only way to understand the performance of your application. I've been in way too many situations where people "optimized" the code but didn't ever profile. The "optimizations" turned out to introduce many bugs and not even be a hot spot in the code path. Waste of everyones time.


A couple of people have commented on the "early" part of my answer. I don't think you should be profiling from day 1. However you should also not be waiting till 1 month from ship either.

I usually first profile once I have a couple of definitive end to end scenarios, or in a larger project, a mostly functional component. I take a day or two (usually working with QA) to get together some large scenarios and throw it at the code. This is a great spot check to find obvious performance problems early. Fixing them at this point is a bit easier.

On a typical project I find that I have code meeting this criterias 30%-40% of the way through the project (100% being in customers hands). I loosely classify this time as early.

+1 for measure, -1 for measure early.
I agree with @Marcin.
Agreed with Marcin. Only measure early if it's really obvious when testing (taking 100s instead of 1s). Only measure when it's a problem, in other words.
Early is relative. I'm not saying profile day #1. But on a 1 year project, i would start profiling no later than month 3 or 4 as opposed to the end.
Wow, no early optimization? Are you guys NUTS? :)One of the most common problems I see during optimization (and I do it for a living) is having to deal with code that has waited to the end to then make it run fast. Like refactoring for clarity, optimization needs to be consider early to avoid...
... inherent performance problems with designs.
There is a big difference between optimizing a design ( O(n!) versus O(log n) ) or optimizing code (re implement some function in assembly)
Patrick Huizinga
Indeed there is a difference, but I have the feeeling that both optimization needs will be pointed out by profiling and careful analysis of the profiling data.
call me Steve

"premature optimization is the root of all evil" (Knuth, Donald)

It really depends on the type of code you write and it's typical usage.

+14  A: 

A famous quote come to mind:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." (Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268.)

But maybe you should not pass large data structures by value anyway... :-)

Edit: And maybe also avoid O(N^2) or more complex algorithms...
I'd take an O(2) algorithm any day. It only takes twice as long as an O(1)!
or don't write code at all ;)
But watch out for that last 3%, it can be a killer.
Thanks for correcting my big-O notation! :-)
+8  A: 

Don't bother optimizing until it's needed. To find out if it's needed, profile. Don't guess; have proof.

Also, algorithmic optimizations usually have a greater impact than micro ones. Using A-star instead of brute force pathfinding will be faster, just like how Bresenham circles are better than using sin/cos. There are exceptions to these of course but they are very (very) rare (<0.1%). If you have a good design, changing the algorithm changes only one module in your code. Easy.

+1  A: 

one simple sugestion is to get into the habit of doing ++i, rather than i++. i++ makes a copy and this can be expensive.

Ray Tayek
I use this myself for all increments on their own line (which is 90% of -- and ++).
I would like to think that the compilers would optimize away the (unused) return value of i++, but you are correct -- they are two distinct functions when overloaded.
+2  A: 

Keep your code as clean as possible. Compilers are MARVELOUS nowadays. Then, if you do have a perf problem, profile.

All of this is after having chosen the best algorithms available for your problem.

+4  A: 

The best thing you can do as far as performance goes is to start with a solid architecture and threading model. Everything else will be built on this, so if your foundation is crummy, your finished product will only be as good as that. Profiling comes a little later, and even later than that come micro-optimizations (in general, these are insignificant, and complicate code more than anything.)

Moral of the story is: Start with an efficient base, build on top of that cognizant of not doing something downright silly and slow, and you should be alright.

I disagree with your view on micro optimizations. They mostly (for me) involve moving pieces of code around so variables are close to operations modifying them, etc. and also moving things algebraically. I tend to avoid killing code readability when I'm refactoring/optimizing.
If you're just moving stuff around, it's probably something the compiler will do for you anyway.
+2  A: 

Here are a few:

  • Effectively leverage Inlining (depending on your platform).

  • Avoid using temporaries as much as possible (and know what they are)

    x = y + z;

    Would be better optimized if written as:



Also, avoid virtual functions and only create objects when you need to use them.

If you're in the mood, check out Efficient C++. I've got a copy at home from back when I was in school.

Mat Nadrofsky
It should be noted that x, y, and z would be class instances and not PoD, and that the + operator is expensive and temporaries are expensive to construct.
Yep, under that restriction, it makes sense.
Check the facts, today compilers all implements the Return Value Optimization and Named Return Value Optimization. Using "x = y + z" allows you to make "x" a const, which may lead to better optimization, more readability and less bug... May not always be worth removing a "small" temporary.
Vincent Robert
The part about not using temporaries is incorrect. I've often made code faster by using "x=y+z" instead of the other. Profile and look at the machine code prior to making micro changes like that.
Edit: I really meant somewhat incorrect on temporaries. And this does apply to non POD types too.
+9  A: 
  • When possible use if or switch instead of calls through function pointers. Clarification: void doit(int m) { switch(m) { case 1: f1(); break; case 2: f2(); break; } } instead of void doit(void(*m)()) { m(); } can inline the calls.
  • When possible and not harm causing, prefer CRTP to virtual functions
  • When possible, avoid C Strings and use a String class. It will be faster most often. (constant time length "measure", appending amortized constant time, ...)
  • Always pass user defined typed values (apart from where it doesn't make sense. e.g iterators) by reference to const (T const&) instead of copying value.
  • For user defined types, always prefer ++t instead of t++
  • Use const early, often. Most important to improve readability.
  • Try keeping new to a minimum. Always prefer automatic variables (on the stack) if possible
  • Instead of filling arrays yourself, prefer initialization with an empty initializer list like T t[N] = { }; if you want zeros.
  • Use the constructor initializer list as often as possible, especially when initializing user defined typed members.
  • Make use of functors (types with operator() overloaded). They inline better than calls through function pointers.
  • Don't use classes like std::vector or std::string if you have a fixed sized quantity not growing. Use boost::array<T, Size> or a naked array and use it properly.

And indeed, i almost forgot it:

Premature optimization is the root of all evil

Johannes Schaub - litb
Avoid C strings and use a String class for performance? I don't think I've ever converted C string code to a managed string class for performance reasons. It has always been the other way around. For the most part, it boils down to greater control over memory allocations.
Torlack. if you have fixed length strings, the last rule tells not to use vector (or string), but use naked arrays. but if you need to pass strings around, concatenate or grow them, you are better off with string classes. both performance and safety wise.
Johannes Schaub - litb
You'll be pleased to know I'm about to test your pass-by-value point. I have an algorithm that currently passes by value (written for int, int64_t, etc), which I've also used with GMP's mpz_class (uint64_t overflowed). I'll let you know how much performance difference the change makes.
Steve Jessop
It is recursive, btw, so there is quite a lot of copying - I'm expecting it to make a modest difference, it's not a straw man example ;-)
Steve Jessop
onebyone. but note . If your struct has not a custom copy constructor and is rather small, pass by value could win.
Johannes Schaub - litb
As it turns out, run time down from 1.87s to 1.59s, which is no big surprise since (without checking the code) I guess copying mpz_class allocates memory. Not bad for about 30 seconds work.
Steve Jessop
onebyone, thanks for sharing :)
Johannes Schaub - litb
+7  A: 

Use existing, reviewed code that's been used and re-used. (Example: STL, boost vs rolling your own containers and algos)

Updating due to comments: CORRECTLY use existing, reviewed code that's been used and re-used.

The problem with this is licenses. Still, there's enough out there where it shouldn't get in your way (because there are some public domain functions out there), unless all the free-use ones have poor performance. +1 for bring it up, though. =]
+2  A: 

Wikibooks has some things.

A good thing to do is know the efficiency of what you're using. How fast addition is to multiplication, how fast a vector is compared to a normal array or to the higher scales how certain algorithms compare. This allows you to choose the most efficient tool for a task

Demur Rumed
+2  A: 

Using generic algorithms is a great optimization tip - not in terms of runtime but of coding time. Knowing that you can sort(start, end) and expect the range - be it two pointers or iterators to a database - will be sorted (and what's more, the algorithm used will be runtime efficient too). Generic programming is what makes C++ unique and powerful, and you should always keep it in mind. You shouldn't have to write many algorithms because versions already exist (and are likely as fast or faster than anything you would write). If you have other considerations, then you can specialize algos.

+10  A: 

Someone mentioned function pointers (and why you should rather use if). Well, even better: use functors instead, they get inlined and usually have zero overhead. A functor is a structure (or class, but usually the former) that overloads operator () and instances of which can be used just like an ordinary function:

template <typename T>
struct add {
    operator T ()(T const& a, T const& b) const { return a + b; }

int result = add<int>()(1, 2);

These can be used almost in every context where an ordinary function or function pointer could be used. They usually derive from either std::unary_function or std::binary_function but that's often not necessary (and actually only done to inherit some useful typedefs).

EDIT The explicit <int> type qualification is necessary in the above code. Type inference only works for function calls, not for instance creation. However, it can often be omitted by employing a make helper function. This is done in the STL for pairs:

template <typename T1, typename T2>
pair<T1, T2> make_pair(T1 const& first, T2 const& second) {
    return pair<T1, T2>(first, second);

// Implied types:
pair<int, float> pif = make_pair(1, 1.0f);

Someone mentioned in the comments that functors are sometimes called “functionoids”. Yesish – but not quite. In fact, “functor” is a (somewhat weird) abbreviation for “function object”. A functionoid is conceptually similar but realized by employing virtual functions (although they are sometimes used synonymously). For example, a functionoid could look like this (along with its necessary interface definition):

template <typename T, typename R>
struct UnaryFunctionoid {
    virtual R invoke(T const& value) const = 0;

struct IsEvenFunction : UnaryFunctionoid<int, bool> {
    bool invoke(int const& value) const { return value % 2 == 0; }

// call it, somewhat clumsily:
UnaryFunctionoid const& f = IsEvenFunction();
f.invoke(4); // true

Of course, this loses any performance advantage that a functor has because of its virtual function call. It is therefore used in a different context that actually requires a polymorphic (stateful) runtime function.

The C++ FAQ has more to say on this subject.

Konrad Rudolph
I don't think the <int> in your last line is even necessary (I could be wrong), so you don't have to kill readability when using "functors" (I have heard them called "functionids" instead).
Great point! No need to put the inline keyword on your small functions if you are using them by pointers. Pointed functions are NEVER inlined, whatever the keywords used. Keep that in mind and use functors.
Vincent Robert
strager: The `<int>` is necessary, see update to my post. Same goes for functionoids.
Konrad Rudolph
good post. tho my point against function ptrs and for switch/if was more like if(method == 1) return log10(value); else return log(vaue); and not double(*method)(double) = ... method(value); i would still prefer if against boost::function and a function object here (no template can be used!)
Johannes Schaub - litb
litb: Well I'm stumped. o.O … I've never seen anyone make *this* much wasteful usage of any feature. ;-)
Konrad Rudolph
Thanks for the update; I didn't realize that about function objects. =]
Watch out for functors, there is overhead. Often, the compiler can not optimize templates as well as inline code. I've seen simple functors not use registers where inline code did not. This is one of the reasons Scott Meyer's was wrong in the paper about handwritten loops.
@Torlack, I would be really grateful for an example because this is something I have problems imagining. Usually, small (templated) functors are completely inlined and benefit from all register optimizations of the compiler. Are you sure there were unused registers?
Konrad Rudolph
+1  A: 

Consider using a memory pool.

Martin Cote
+3  A: 

Two of the best tips for C++:

Purchase Effective C++, by Scott Meyers.

Then purchase More Effective C++, by Scott Meyers.

+4  A: 

Another point: The fastest code is code that doesn't exist. Which means the more robust and feature-full your project needs to be, the slower it will be. Bottom line: Skip the fluff whenever possible, while making sure you still meet the requirements.

+1  A: 
  1. Always try to think on how your memory looks - for example an array is a consecutive line of memory of the size numOfObjects X sizeof(object). a two dimensional array is n X m X sizeof(object) and each object is in the index of n + m X n and so

    for(int i = 0 ; i < n ; i++){
        for(int j = 0 ; j < m ; j++){
            arr[i,j] = f();

    is much better then (on single process):

    for(int i = 0 ; i < n ; i++){
        for(int j = 0 ; j < m ; j++){
            arr[j,i] = f();

    Because the array is brought into the cache in consecutive chunks the 1st snippet runs on all the cells that are in the cache before fetching the rest while the 2nd snippet would need to fetch new array cells into the cells over and over again

  2. When your application starts to slow use performance benchmark to find the exact bottleneck even a simple GetTickCount calls can be used to determine the time it takes your components to run. On bigger projects use a proper profiler before you start to optimize so you will spend the most optimization effort where it matters.

Dror Helper
+1  A: 

Don't use grossly inefficient algorithms, turn on the optimization in your compiler, don't optimize anything unless a profiler shows it to be a bottleneck, and when you try to improve things test to see if you've done good or bad. Remember also that library functions have usually been optimized by people better at it than you are.

Pretty much everything else is minor, compared to these.

David Thornley
+1  A: 

Make sure the first version of your code is the simplest thing that works.

This will give you maximum possible time under your deadline to optimize properly, by measuring for bottlenecks and low-hanging fruit. By optimizing too soon, you not only complicate your code, you spend half a day agonizing over whether you can reduce 12 memory accesses to 8 in a bit of code called one per graphics cycle, which you could have spend with the profiler identifying and doubling the speed of your app's most common operation. Or even down the pub, because the app was fast enough first time and shipped half a day early.

Exceptions include:

  • When the obvious algorithm is obviously awful. Note that a linear search isn't necessarily in this category just because O(log n) searches exist for some data structures. A linear search of a billion items is in this category, as is a linear search of 1000 items performed more than, say, 10 000 times per second. As is anything with exponential complexity that will ever need to scale significantly.

  • When the obvious algorithm performs an operation which is obviously too slow for the circumstances (e.g. connecting to a server out on the internet somewhere in a redraw function).

  • When you've seen this before, and you know what happens and what the better alternative is. e.g. making copies of large objects like collections instead of passing by pointer or reference.

The following are not exceptions in the first cut of your code, although they might (might) become candidates later:

  • There's a bit-twiddling operation that's two instructions fewer on x86 than the obvious arithmetic. (yippee! A billion more optimizations like that, and we've gained a second! Except that we're looking at op count instead of cycle count, and the two are nigh-on unrelated on modern x86 processors. And the code is now unreadable.)

  • Fewer, longer, functions result in less call overhead. (and more programmer overhead).

  • 32bit arithmetic is faster than 64bit (if all your app does is crunch numbers, then this could become your bottleneck, in which case you'll catch it later. Graphics optimization folks may count this as "seen before, know what happens", but relatively few apps are CPU bound, let alone bound by integer arithmetic. And as computers get bigger and faster, all your arithmetic will speed up, whereas that 2Gig limit on whatever you're counting might start to look cramped).

Steve Jessop
+1  A: 

Basically your biggest performance gains are to be had by algorithmic improvements. This means using the most efficient algorithms and, in turn, the most efficient containers for data items.

Sometimes it is hard to know what the best trade-offs are, but fortunately the designers of the STL had exactly this use case in mind and hence the STL containers are generally flexible enough to allow containers to be mixed and matched in accordance with the demands of the application.

To fully realise the benefit however you need to ensure that you don't expose an internal design choice as part of the interface of your class/module/whatever. None of your clients should depend on your use of a std::vector. At the very least provide a typedef that they (and you) can use, which should allow the vector to be changed to a list (or whatever) depending on your needs.

Similarly make sure that you've got the widest possible choice of debugged algorithms and containers at your disposal. Boost and/or TR1 are prettymuch necessities these days.


Whatever action you take to save a few cycles, remember this: Don't try to be smarter than the compiler - measure to verify the gain.

Johann Gerell

though not addressing the exact issue, some advice is :

always code for interfaces ( when it comes to algorithms ) so that u can smoothly replace them by efficient one (by any req means ).


I agree with the advice on premature optimization. However, there are a few guidelines I like to follow during design because they may be difficult to optimize later:

  • Try to allocate all your objects on start up, minimize the use of new during run time.
  • Design your data structures so that your algorithms have a good shot at being O(1).
  • And as usual, modularize so that you can rip out and replace later. This implies you have a comprehensive suite of unit tests that give you the confidence your new solution is correct.
  • Add performance tests to your unit test suite to make sure you don't inadvertently end up with some O(n*n) code :-)
Mark Beckwith
First point is impracticle. Instead try to reuse. As a rule of thumb, always minimize calls to 'lower' layers.
Vardhan Varma
  • Rule 1: Don't
  • Rule 2: Measure
Roger Lipscombe
+1  A: 


From one of the C++ books i referred (Efficient C++ performance Techniques by Bulka and Mayhew), which explicitly talked about C++ performance aspects. One of them was;

while defining constructors..initialize the other constructors also; some thing like;

class x {

x::x(char *str):m_x(str) {} // and not as x::x(char *str) { m_str(str); }

private: std::string m_x;


The above is some thing which caught my attention and helped me to improve my coding style... this book has more to share on this interesting topic of performance.

-thanks Harish

Does this also improve performance ?
Vardhan Varma

Select best performing algorithms, use less memory, use less branching, use fast operations, use small number of iterations.