views:

515

answers:

19

I would like to understand good code optimization methods and methodology.

  1. How do I keep from doing premature optimization if I am thinking about performance already.
  2. How do I find the bottlenecks in my code?
  3. How do I make sure that over time my program does not become any slower?
  4. What are some common performance errors to avoid (e.g.; I know it is bad in some languages to return while inside the catch portion of a try{} catch{} block
+22  A: 
  1. Don't think about performance, think about clarity and correctness.
  2. Use a profiler.
  3. Keep using a profiler.
  4. See 1.
anon
A: 

1, 2, and 3 have the same answer: Profile. Get a good profiler and run it on your app, both in intrusive and in sampling modes. This will show you where your bottlenecks are, how severe they are, and doing it on a regular basis will show you where perf has gotten worse from week to week.

You can also simply put stopwatches into your app so that it tells you, say, exactly how many seconds it takes to load a file; you'll notice if that number gets bigger, especially if you log it.

4 is a big, big question that ranges all the way from high-level algorithm design down to tiny details of a particular CPU's pipeline. There's lots of resources out there, but always start at the high level -- take your outer loop from O(N^2) to O(N log N) before you start to worry about integer opcode latency and the like.

Crashworks
+4  A: 

We had a subcontractor write us a non-trivial amount of code for non-trivial program. Apparently they always used trivial amount of data. So..

  • Use profiler
  • Use non-trivial amount of data when testing. If possible make that humongous amount of data.
  • Use sane algorithms when things become tight.
  • Use profiler to check that whatever "optimization" you've done is actually correct, f.example recent "java jar" fiasco where O(1) operation was done as O(n).
Pasi Savolainen
+5  A: 

Make sure you have clearly defined performance goals and tests that measure against those goals so you can quickly find out if you even have a problem.

Think about performance more from a design perspective than from a coding perspective - optimizing a poor-performing design just results in faster slow code

When you do have a performance problem, use a tool such as a profiler to identify the problem - you can guess where your bottlenecks are and usually guess wrong.

Fix performance problems early in development rather than putting them off - as time goes on and features make it into the product fixing perf issues will only become more and more difficult.

Michael
"Fix performance problems early in development rather than putting them off - as time goes on and features make it into the product fixing perf issues will only become more and more difficult." Quoted-for-truth. Sometimes I feel like everyone else on Stack Overflow has forgotten or chosen to ignore this principle.
Crashworks
Yes, I recall one project where they were not even close to meeting their performance goals and had scheduled "performance" for a much later milestone. I don't know how they expected to meet their perf goals later with a load of features piled on top of their already unperforming code base. And their perf milestone was a huge failure.
Michael
+4  A: 

The number one rule I use is, DRY (Don't Repeat Yourself). I find that this rule does a good job of highlighting problem areas that can be fixed without hurting the clarity of the program. It also makes it easier to fix bottlenecks once you discover them.

Adam Ruth
+1  A: 

Profile, profile, profile. Use valgrind if you can (along with the kcachegrind visualizer), otherwise use the venerable gprof.

My top performance hits:

  1. Allocating memory without freeing it. Possible only using C and C++.
  2. Allocating memory.
  3. Calls to really small procedures, functions, or methods that your compiler somehow fails to inline.
  4. Memory traffic.

Everything else is in the noise.

Norman Ramsey
+1  A: 

This takes experience, I'm afraid. When you conceive of solutions to a problem, you may think in terms of class hierarchies, or you may think in terms of what information goes in, what comes out, how long does it need to be persistent in between. I recommend the latter.

In any case, what people have said is mostly good advice - keep it clean and simple, and get rid of performance problems as they come in, because they will come in.

Where I part company is I don't find measurement very helpful for locating performance problems compared to this method.

But whatever method you use, hopefully experience will teach you what NOT to do in developing software. I've been solving performance problems for a long time, and these days, the single most popular performance killer is galloping generality. Nobody likes to hear their favorite beliefs questioned, but time after time, especially in big software, what is killing performance is using bazookas to swat flies.

Oddly enough, the reason often given for this over-design is guess what? Performance.

In whatever venue you may have learned to program, chances are you've learned all about academic things like sophisticated data structures, abstract class hierarchies, tricky compiler optimization techniques - all the latest stuff that's fun and interesting to know, and that I like as much as anybody. What they didn't teach you is when to use it, which is almost never.

So what I recommend you do is: Get the experience. It is the best teacher.

Mike Dunlavey
+11  A: 

The rules of Optimization Club:

  1. The first rule of Optimization Club is, you do not Optimize.
  2. The second rule of Optimization Club is, you do not Optimize without measuring.
  3. If your app is running faster than the underlying transport protocol, the optimization is over.
  4. One factor at a time.
  5. No marketroids, no marketroid schedules.
  6. Testing will go on as long as it has to.
  7. If this is your first night at Optimization Club, you have to write a test case.

http://xoa.petdance.com/Rules_of_Optimization_Club

Rule #3 is the one that trips most people up. It doesn't matter how fast your calculations are if your program sits waiting for disk writes or network transfer.

Rules #6 and #7: Always have tests. If you're optimizing, you're refactoring, and you don't want to be refactoring without having a solid test suite.

Andy Lester
+3  A: 

Early optimzation isn't always premature - it's bad only if you hurt other interests (readability, maintenance, time to implement, code size, ...) without justification.

On stackoverflow, early optimization is the new goto, don't get discouraged by that. Any decision going wrong early is hard to fix later. Optimization is special only because experience shows it can often be fixed locally, whereas sucky code requires large scale changes.


Sorry for the rant, now for your actual question:

Know your environment!
This includes all the low level details - - e.g. nonlinearity memory access, things the compiler can optimize, etc. The trick is notto fret at's a lot you shouldn't fret to much, just be aware of.

Measure measure measure!
The results of actual optimization attempts are often surprising, especially if you vary seemingly unlrelated factors. It is also the best way to develop a relaxed attitude towards performance - most of the time it really doesn't matter.

Think about algorithms before you think about implementation details.
Most low level optimizations give you a factor of 1.1, a different algorithm can give you a factor of 10. A good (!) caching strategy can give you a factor of 100. Figuring out that you really don't need to make the call gives you Warp 10.

This usually leads me to thinking about how to organize the data: what are frequent operations that are potential bottlenecks, or scalability issues?

peterchen
A: 
  • Only optimize when you have performance issues.
  • Only optimize the slow parts, as measured!
  • Finding a better algorithm can save you orders of magnitude, rather than a few percent.

It's mentioned above, but it's worth talking about more: measure! You must measure to make sure you're optimizing the right thing. You must measure to know if you've improved, or improved enough, and by how much. Record your measurements!

Also, often you will identify a routine as taking, say, >75% of the total time. It's worth taking the time to profile at a finer grain... often you will find most of the time within that routine is spent in a very small part of the code.

dwc
+7  A: 

It's always good to remember what things "cost". Some C# examples:

  • String concatenation always creates a new string as strings are immutable. Therefore, for repeated concatenations, a StringBuilder is more efficient.

  • Repeated or large memory allocations are generally something you should watch for.

  • Exceptions are very expensive to throw. This is one of the reasons why exceptions should only be used for exceptional situations.

Most things beyond this are premature optimization. Use a profiler if speed matters.


Regarding your "optimizations":

  • I doubt that floating point arithmetic (* 0.5) is faster than an integer division (/ 2).

  • If you need an array of size 300 you should initialize an array of size 300. There's nothing "magic" about powers of 2 that make arrays of size 256 more efficient.

  • "requires 2 calls in the code" is wrong.

dtb
okay i consider myself corrected :) good to know
Brock Woolf
On Core2, floating point multiplication (fmul, 5 cycles latency) actually is significantly cheaper than integer division (idiv, 18-42 cyles latency, depending on how big the numbers are). See Agner Fog's instruction tables for more details. But that's not relevant here, because this example won't actually compile to idiv.
Josh Haberman
@Josh: `fmul` may be faster than `idiv` – but what about the conversions to and from `double`?
Konrad Rudolph
Actually, exceptions are expensive to *create* not as much so to *throw*.
Software Monkey
+3  A: 

I would say that little optimisations you can make on the go are exactly those that do not make much sense. If you want to optimise, write the code first, then profile and only then optimise the parts that take too much time. And even then it’s usually algorithms that need optimization, not the actual code.

zoul
+1  A: 

You may want to look into what compilers DO optimize away - many compilers can optimize away things like tail recursion, and most other minor optimizations are trivial by comparison. My methodology is to write things so that they're as readable/manageable as possible, and then, if I need to, look so see if the generated assembly code needs optimization of some sort. This way no time needs to be spent optimizing things that don't need to be optimized.

T.R.
How do you learn that from a debugger?
Thorbjørn Ravn Andersen
Ah, hehe, I misspoke.
T.R.
+14  A: 

EDIT This answer originally appeared in another question (that has been merged) where the OP listed some possible optimizations techniques that he just assumed must work. All of them relied heavily on assumptions (such as x << 1 being always faster than x * 2). Below I am trying to point out the danger of such assumptions.


Since all of your points are probably wrong this shows the danger of such premature and trivial optimizations. Leave such decisions to the compiler, unless you know very well what you are doing and that it matters.

Otherwise it – just – doesn’t – matter.

Much more important (and not at all premature) are optimizations in the general program structure. For example, it is probably very bad to re-generate the same large batch of data over and over because it’s needed in many places. Instead, some thought has to be put into the design to allow sharing this data and thus only calculating it once.

It’s also very important to know the domain you’re working in. I come from a bioinformatics background and do a lot of hardcore algorithm work in C++. I often deal with huge amount of data. At the moment though, I’m creating an application in Java and I cringe every time I create a copy of a container because I’m conditioned to avoid such operations as hell. But for my fancy Java GUI those operations are completely trivial and not one bit noticeable to the user. Oh well, I’ve just got to get over myself.

By the way:

Initialise constants when you declare them (if you can) …

Well, in many languages (e.g. C++) constants (i.e. identifiers marked as const) have to be initalized upon definition so you don’t actually have a choice in the matter. However, it is a good idea to follow that rule, not only for constants but in general. The reason isn’t necessarily performance, though. It’s just much cleaner code because it clearly binds each identifier to a purpose instead of letting it fly around.

Konrad Rudolph
great points +1
Brock Woolf
Agree that posters optimization tricks don't work.
Martin York
+2  A: 

I don't know if you will buy it, but early optimization is not the root of all evil

Aaron Qian
+1: Note - I almost down-voted this based on the (originally) inverted assertion of the link text (which I have just corrected), until I actually read the article.
Software Monkey
+4  A: 
  1. Write readable code showing intent. Do not microoptimize - you cannot outsmart the JIT.
  2. Learn to use a profiler, e.g. jvisualvm in the Sun JDK, and use it.
Thorbjørn Ravn Andersen
+3  A: 

I'd even say for integral types, instead of multiplying by 0.5 you could just shift them one bit to the right (not forgetting signed/unsigned shifting).

I am quite sure that, at least in the case of C#, the compiler will optimize a lot.

for example:

Guid x = Guid.NewGuid();

as well as

Guid x; x = Guid.NewGuid();

both translate to the following CIL:

call        System.Guid.NewGuid
stloc.0

Constant expressions such as (4 + 5) are precalculated as well as string concatenations like "Hello " + "World".

I would primarily concentrate on readable and maintainable code though. It's highly unlikely that such microoptimizations make a big difference except for special edge cases.

In my case, the most gain in (sometimes perceived) performance were things like the following:

  • If you fetch data from a database or the internet, use a separate thread
  • If you load large XML Files, do it on a separate thread.

of course, that's only my experience from C#, but in-/output operations are common bottlenecks.

Botz3000
+18  A: 

For the kinds of optimizations you are suggesting, you should write your code for clarity and and not optimize them until you have proof that they are a bottleneck.

One danger of attempting micro-optimizations like this is that you will likely make things slower, because the compiler is smarter than you are a lot of the time.

Take your "optimization":

const int windowPosX = (screenWidth * 0.5) - (windowWidth * 0.5);

There is no serious compiler in the world that doesn't know that the fastest way to divide by two is to shift right by one. Multiplying by floating-point 0.5 is actually more expensive, because it requires converting to floating-point and back, and doing two multiplies (which are more expensive than shifts).

But don't take my word for it. Look at what the compiler actually does. gcc 4.3.3 on 32-bit Ubuntu (-O3, -msse3, -fomit-frame-pointer) compiles this:

int posx(unsigned int screen_width, unsigned int window_width) {
  return (screen_width / 2) - (window_width / 2);
}

to this:

00000040 <posx>:
  40:   8b 44 24 04           mov    eax,DWORD PTR [esp+0x4]
  44:   8b 54 24 08           mov    edx,DWORD PTR [esp+0x8]
  48:   d1 e8                 shr    eax,1
  4a:   d1 ea                 shr    edx,1
  4c:   29 d0                 sub    eax,edx
  4e:   c3

Two shifts (using an immediate operand) and a subtract. Very cheap. On the other hand, it compiles this:

int posx(unsigned int screen_width, unsigned int window_width) {
  return (screen_width * 0.5) - (window_width * 0.5);
}

to this:

00000000 <posx>:
   0:   83 ec 04              sub    esp,0x4
   3:   31 d2                 xor    edx,edx
   5:   8b 44 24 08           mov    eax,DWORD PTR [esp+0x8]
   9:   52                    push   edx
   a:   31 d2                 xor    edx,edx
   c:   50                    push   eax
   d:   df 2c 24              fild   QWORD PTR [esp]
  10:   83 c4 08              add    esp,0x8
  13:   d8 0d 00 00 00 00     fmul   DWORD PTR ds:0x0
      15: R_386_32 .rodata.cst4
  19:   8b 44 24 0c           mov    eax,DWORD PTR [esp+0xc]
  1d:   52                    push   edx
  1e:   50                    push   eax
  1f:   df 2c 24              fild   QWORD PTR [esp]
  22:   d8 0d 04 00 00 00     fmul   DWORD PTR ds:0x4
      24: R_386_32 .rodata.cst4
  28:   de c1                 faddp  st(1),st
  2a:   db 4c 24 08           fisttp DWORD PTR [esp+0x8]
  2e:   8b 44 24 08           mov    eax,DWORD PTR [esp+0x8]
  32:   83 c4 0c              add    esp,0xc
  35:   c3                    ret

What you're seeing is conversion to floating-point, multiplication by a value from the data segment (which may or may not be in cache), and conversion back to integer.

Please think of this example when you're tempted to perform micro-optimizations like this. Not only is it premature, but it might not help at all (in this case it significantly hurt!)

Seriously: don't do it. I think a golden rule is never to do optimizations like this unless you routinely inspect your compiler's output as I have done here.

Josh Haberman
Thank you so much for this. I don't know why it never occurred to me to disassemble myself. Truly eye opening. It also highlights that you shouldn't believe everything you hear from other programmers (which is where I 'learned' the '/0.5' instead of '*2'). Thanks champ
Brock Woolf
the example is interesting cause it shows the disassembly but it kills your point that the compiler is "smart". It also does not fit you point of premature optimization. I don't think many people would consider *0.5 an optimization. Premature optimization comes from using C to avoid stl containers like vector and string, omitting virtual or making any other compromise on code clarity and correctness when not necessary. Mainly it is about optimizing code while in design, and probably by the time the software is released the design has changed and the optimisation was time/work down the drain
ufotds
+2  A: 

There are certainly optimizations you can do as you go, such as passing large objects by const reference in C++ (C++ Optimization Strategies and Techniques). The first suggestion ("don't divide by 2") might fall under a "strategy that bombs" - assuming certain operations are faster than others.

One premature (?) optimization I'm guilty of is moving declarations of expensive objects out of loops. E.g if a loop would require a fresh vector for each iteration, I often do:

std::vector<T> vec;
while (x) {
    vec.clear(); //instead of declaring the vector here
    ...
}

I once made a vector used locally in a member function static as an optimization (to reduce memory allocations - this function was called very often), but that bombed when I decided I'd gain real performance from using more than one of these objects in multiple threads.

UncleBens