When coding, what is a good rule of thumb to keep in mind with respect to performance? There are endless ways to optimize for a specific platform and compiler, but I'm looking for answers that apply equally well (or almost) across compilers and platforms.
The number #1 performance tip is to profile your code early and often. There are a lot of general "don't do this" tips but it's really hard to guarantee this will impact the performance of your application. Why? Every application is different. It's easy to say that passing a vector by value is bad if you have a lot of elements but does your program even use a vector (you probably should but ...)?
Profiling is the only way to understand the performance of your application. I've been in way too many situations where people "optimized" the code but didn't ever profile. The "optimizations" turned out to introduce many bugs and not even be a hot spot in the code path. Waste of everyones time.
EDIT:
A couple of people have commented on the "early" part of my answer. I don't think you should be profiling from day 1. However you should also not be waiting till 1 month from ship either.
I usually first profile once I have a couple of definitive end to end scenarios, or in a larger project, a mostly functional component. I take a day or two (usually working with QA) to get together some large scenarios and throw it at the code. This is a great spot check to find obvious performance problems early. Fixing them at this point is a bit easier.
On a typical project I find that I have code meeting this criterias 30%-40% of the way through the project (100% being in customers hands). I loosely classify this time as early.
"premature optimization is the root of all evil" (Knuth, Donald)
It really depends on the type of code you write and it's typical usage.
A famous quote come to mind:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." (Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268.)
But maybe you should not pass large data structures by value anyway... :-)
Edit: And maybe also avoid O(N^2) or more complex algorithms...
Don't bother optimizing until it's needed. To find out if it's needed, profile. Don't guess; have proof.
Also, algorithmic optimizations usually have a greater impact than micro ones. Using A-star instead of brute force pathfinding will be faster, just like how Bresenham circles are better than using sin/cos. There are exceptions to these of course but they are very (very) rare (<0.1%). If you have a good design, changing the algorithm changes only one module in your code. Easy.
one simple sugestion is to get into the habit of doing ++i, rather than i++. i++ makes a copy and this can be expensive.
Keep your code as clean as possible. Compilers are MARVELOUS nowadays. Then, if you do have a perf problem, profile.
All of this is after having chosen the best algorithms available for your problem.
The best thing you can do as far as performance goes is to start with a solid architecture and threading model. Everything else will be built on this, so if your foundation is crummy, your finished product will only be as good as that. Profiling comes a little later, and even later than that come micro-optimizations (in general, these are insignificant, and complicate code more than anything.)
Moral of the story is: Start with an efficient base, build on top of that cognizant of not doing something downright silly and slow, and you should be alright.
Here are a few:
Effectively leverage Inlining (depending on your platform).
Avoid using temporaries as much as possible (and know what they are)
x = y + z;
Would be better optimized if written as:
x=y;
x+=z;
Also, avoid virtual functions and only create objects when you need to use them.
If you're in the mood, check out Efficient C++. I've got a copy at home from back when I was in school.
- When possible use
if
orswitch
instead of calls through function pointers. Clarification:void doit(int m) { switch(m) { case 1: f1(); break; case 2: f2(); break; } }
instead ofvoid doit(void(*m)()) { m(); }
can inline the calls. - When possible and not harm causing, prefer CRTP to virtual functions
- When possible, avoid C Strings and use a String class. It will be faster most often. (constant time length "measure", appending amortized constant time, ...)
- Always pass user defined typed values (apart from where it doesn't make sense. e.g iterators) by reference to const (T const&) instead of copying value.
- For user defined types, always prefer
++t
instead oft++
- Use
const
early, often. Most important to improve readability. - Try keeping
new
to a minimum. Always prefer automatic variables (on the stack) if possible - Instead of filling arrays yourself, prefer initialization with an empty initializer list like
T t[N] = { };
if you want zeros. - Use the constructor initializer list as often as possible, especially when initializing user defined typed members.
- Make use of functors (types with
operator()
overloaded). They inline better than calls through function pointers. - Don't use classes like
std::vector
orstd::string
if you have a fixed sized quantity not growing. Useboost::array<T, Size>
or a naked array and use it properly.
And indeed, i almost forgot it:
Premature optimization is the root of all evil
Use existing, reviewed code that's been used and re-used. (Example: STL, boost vs rolling your own containers and algos)
Updating due to comments: CORRECTLY use existing, reviewed code that's been used and re-used.
A good thing to do is know the efficiency of what you're using. How fast addition is to multiplication, how fast a vector is compared to a normal array or to the higher scales how certain algorithms compare. This allows you to choose the most efficient tool for a task
Using generic algorithms is a great optimization tip - not in terms of runtime but of coding time. Knowing that you can sort(start, end) and expect the range - be it two pointers or iterators to a database - will be sorted (and what's more, the algorithm used will be runtime efficient too). Generic programming is what makes C++ unique and powerful, and you should always keep it in mind. You shouldn't have to write many algorithms because versions already exist (and are likely as fast or faster than anything you would write). If you have other considerations, then you can specialize algos.
Someone mentioned function pointers (and why you should rather use if
). Well, even better: use functors instead, they get inlined and usually have zero overhead. A functor is a structure (or class, but usually the former) that overloads operator ()
and instances of which can be used just like an ordinary function:
template <typename T>
struct add {
operator T ()(T const& a, T const& b) const { return a + b; }
};
int result = add<int>()(1, 2);
These can be used almost in every context where an ordinary function or function pointer could be used. They usually derive from either std::unary_function
or std::binary_function
but that's often not necessary (and actually only done to inherit some useful typedef
s).
EDIT The explicit <int>
type qualification is necessary in the above code. Type inference only works for function calls, not for instance creation. However, it can often be omitted by employing a make
helper function. This is done in the STL for pair
s:
template <typename T1, typename T2>
pair<T1, T2> make_pair(T1 const& first, T2 const& second) {
return pair<T1, T2>(first, second);
}
// Implied types:
pair<int, float> pif = make_pair(1, 1.0f);
Someone mentioned in the comments that functors are sometimes called “functionoids”. Yesish – but not quite. In fact, “functor” is a (somewhat weird) abbreviation for “function object”. A functionoid is conceptually similar but realized by employing virtual functions (although they are sometimes used synonymously). For example, a functionoid could look like this (along with its necessary interface definition):
template <typename T, typename R>
struct UnaryFunctionoid {
virtual R invoke(T const& value) const = 0;
};
struct IsEvenFunction : UnaryFunctionoid<int, bool> {
bool invoke(int const& value) const { return value % 2 == 0; }
};
// call it, somewhat clumsily:
UnaryFunctionoid const& f = IsEvenFunction();
f.invoke(4); // true
Of course, this loses any performance advantage that a functor has because of its virtual function call. It is therefore used in a different context that actually requires a polymorphic (stateful) runtime function.
The C++ FAQ has more to say on this subject.
Two of the best tips for C++:
Purchase Effective C++, by Scott Meyers.
Then purchase More Effective C++, by Scott Meyers.
Another point: The fastest code is code that doesn't exist. Which means the more robust and feature-full your project needs to be, the slower it will be. Bottom line: Skip the fluff whenever possible, while making sure you still meet the requirements.
Always try to think on how your memory looks - for example an array is a consecutive line of memory of the size numOfObjects X
sizeof(object)
. a two dimensional array is n X m Xsizeof(object)
and each object is in the index of n + m X n and sofor(int i = 0 ; i < n ; i++){ for(int j = 0 ; j < m ; j++){ arr[i,j] = f();
is much better then (on single process):
for(int i = 0 ; i < n ; i++){ for(int j = 0 ; j < m ; j++){ arr[j,i] = f();
Because the array is brought into the cache in consecutive chunks the 1st snippet runs on all the cells that are in the cache before fetching the rest while the 2nd snippet would need to fetch new array cells into the cells over and over again
When your application starts to slow use performance benchmark to find the exact bottleneck even a simple
GetTickCount
calls can be used to determine the time it takes your components to run. On bigger projects use a proper profiler before you start to optimize so you will spend the most optimization effort where it matters.
Don't use grossly inefficient algorithms, turn on the optimization in your compiler, don't optimize anything unless a profiler shows it to be a bottleneck, and when you try to improve things test to see if you've done good or bad. Remember also that library functions have usually been optimized by people better at it than you are.
Pretty much everything else is minor, compared to these.
Make sure the first version of your code is the simplest thing that works.
This will give you maximum possible time under your deadline to optimize properly, by measuring for bottlenecks and low-hanging fruit. By optimizing too soon, you not only complicate your code, you spend half a day agonizing over whether you can reduce 12 memory accesses to 8 in a bit of code called one per graphics cycle, which you could have spend with the profiler identifying and doubling the speed of your app's most common operation. Or even down the pub, because the app was fast enough first time and shipped half a day early.
Exceptions include:
When the obvious algorithm is obviously awful. Note that a linear search isn't necessarily in this category just because O(log n) searches exist for some data structures. A linear search of a billion items is in this category, as is a linear search of 1000 items performed more than, say, 10 000 times per second. As is anything with exponential complexity that will ever need to scale significantly.
When the obvious algorithm performs an operation which is obviously too slow for the circumstances (e.g. connecting to a server out on the internet somewhere in a redraw function).
When you've seen this before, and you know what happens and what the better alternative is. e.g. making copies of large objects like collections instead of passing by pointer or reference.
The following are not exceptions in the first cut of your code, although they might (might) become candidates later:
There's a bit-twiddling operation that's two instructions fewer on x86 than the obvious arithmetic. (yippee! A billion more optimizations like that, and we've gained a second! Except that we're looking at op count instead of cycle count, and the two are nigh-on unrelated on modern x86 processors. And the code is now unreadable.)
Fewer, longer, functions result in less call overhead. (and more programmer overhead).
32bit arithmetic is faster than 64bit (if all your app does is crunch numbers, then this could become your bottleneck, in which case you'll catch it later. Graphics optimization folks may count this as "seen before, know what happens", but relatively few apps are CPU bound, let alone bound by integer arithmetic. And as computers get bigger and faster, all your arithmetic will speed up, whereas that 2Gig limit on whatever you're counting might start to look cramped).
Basically your biggest performance gains are to be had by algorithmic improvements. This means using the most efficient algorithms and, in turn, the most efficient containers for data items.
Sometimes it is hard to know what the best trade-offs are, but fortunately the designers of the STL had exactly this use case in mind and hence the STL containers are generally flexible enough to allow containers to be mixed and matched in accordance with the demands of the application.
To fully realise the benefit however you need to ensure that you don't expose an internal design choice as part of the interface of your class/module/whatever. None of your clients should depend on your use of a std::vector
. At the very least provide a typedef that they (and you) can use, which should allow the vector to be changed to a list (or whatever) depending on your needs.
Similarly make sure that you've got the widest possible choice of debugged algorithms and containers at your disposal. Boost and/or TR1 are prettymuch necessities these days.
Whatever action you take to save a few cycles, remember this: Don't try to be smarter than the compiler - measure to verify the gain.
though not addressing the exact issue, some advice is :
always code for interfaces ( when it comes to algorithms ) so that u can smoothly replace them by efficient one (by any req means ).
I agree with the advice on premature optimization. However, there are a few guidelines I like to follow during design because they may be difficult to optimize later:
- Try to allocate all your objects on start up, minimize the use of
new
during run time. - Design your data structures so that your algorithms have a good shot at being O(1).
- And as usual, modularize so that you can rip out and replace later. This implies you have a comprehensive suite of unit tests that give you the confidence your new solution is correct.
- Add performance tests to your unit test suite to make sure you don't inadvertently end up with some O(n*n) code :-)
Hi!,
From one of the C++ books i referred (Efficient C++ performance Techniques by Bulka and Mayhew), which explicitly talked about C++ performance aspects. One of them was;
while defining constructors..initialize the other constructors also; some thing like;
class x {
x::x(char *str):m_x(str) {} // and not as x::x(char *str) { m_str(str); }
private: std::string m_x;
};
The above is some thing which caught my attention and helped me to improve my coding style... this book has more to share on this interesting topic of performance.
-thanks Harish
Select best performing algorithms, use less memory, use less branching, use fast operations, use small number of iterations.