views:

858

answers:

12
+10  Q: 

Optimizing Code

You are given a heap of code in your favorite language which combines to form a rather complicated application. It runs rather slowly, and your boss has asked you to optimize it. What are the steps you follow to most efficiently optimize the code?

What strategies have you found to be unsuccessful when optimizing code?

Re-writes: At what point do you decide to stop optimizing and say "This is as fast as it'll get without a complete re-write." In what cases would you advocate a simple complete re-write anyway? How would you go about designing it?

+36  A: 

Profile before attempting any optimisation.

9 times out of 10, the time won't be consumed where you might guess it is.

The usual unsuccessful strategy is micro-optimisation, when what is actually required is the appropriate algorithm.

Mitch Wheat
Agreed, optimization is useless without a baseline to work from. If you can't measure it, you can't improve it.
paxdiablo
+21  A: 

Steps:

  1. Measure
  2. Analyze
  3. Decide
  4. Implement
  5. Repeat

First get a profiler to measure the code. Do not fall into the trap of assuming you know where bottlenecks are. Even afterwards, if your assumptions prove to be correct, do not think that you can skip the measurement step the next time you have a similar task.

Then analyze your findings. Look at the code, identify the bottlenecks you can do something about that will have an effect. Try to estimate how much of an improvement this will be.

Decide if you want to go down this path, based on your analysis. Will the gains be worth it? Is a rewrite warranted? Perhaps you find that though it runs slow, it's as good as it is going to get or that you're close to the top of the performance curve where the work needed to eek out a miniscule improvement isn't warranted.

Implement your changes, do the rewrite if necessary, or refactor the code if that's the path you've gone down. Do it in small steps so that it becomes easy to measure if your change gave what you wanted or if you need to backtrack a step and try a different route.

Then go back to the start and measure, analyze, decide, implement, etc.

Also, on the note of refactoring code. The first things you should change are the big algorithm-level approaches. Things like replacing a sort algorithm with another that performs better, etc. Do not start with line-level optimizations, like how to get a line that increments a value to go slightly faster. Those are last level optimizations and usually not worth it, unless you're running in extreme performance conditions.

Lasse V. Karlsen
You forgot 0: DefineWithout a precise definition of what "slow", "fast", "fast enough", "performance", "speed" etc. mean, you'll never know when you're done. In fact, you'll never even know whether you make progress at all.Also, you won't be able to do #1, because you don't know what to measure.
Jörg W Mittag
+8  A: 

Don't even bother to try anything without some sort of profiling, I can't stress this enough! Unfortunately, all the profilers I know either suck or are expensive (Yay for business costs!) so I'll let others make recommendations there :).

You know you need a re-write when your data tells you that you need a re-write, which sounds recursive, but really just means the the cost of your current architecture or software stack is enough by itself to push you over the performance cliff, so nothing you do in local changes can fix the overall problem. However, before you get out the File->New... command, make a plan, build a prototype and test the prototype does better than the current system for the same task: it's amazing how often there is no noticable difference!

Simon Buchan
Very well said regarding rewrites, especially the part about the stack costing more than the program.
fluffels
Painful experience :(
Simon Buchan
I agree, Simon. Personally I split "profiling" into two purposes. 1 is to quantify any changes. 2 is to _find_ the problems. People love tools, but in my answer (below) I've explained how I do it.
Mike Dunlavey
A: 

Be sure to have enough unit test to make sure any optimization you will make will not break anything

Make sure to qualify your execution environment. Sometime, a simple change in the execution options can go a long way.

Then and only then, start analyzing the code.

The rewrite decision (for an already working code) should only be considered if there is enough future evolutions which may not be supported by the current architecture.
If simple fixes can speed up a code which is not supposed to evolve much, no complete rewrite should be necessary.

The criteria to stop is usually determined in collaboration with the end user (the client), but I would suggest a formal document fixing the goals to achieve with this optimization process.

VonC
+1  A: 

Besides profiling, as everyone mentions, the two solutions I always head for first (after profilling) are Memoization and Lazy loading, they are both easy to implement and normally make a big difference.

Robert Gould
A: 

First decide what your optimisation goal is - set a target for timing of particular operations on a given hardware platform. Measure the performance accurately (ensure your results are repeatable) and in a production-like environment (no VMs etc unless that's what you use in production!).

Then if you decide it's already fast enough, you can stop there.

If it's still not good enough, then some extra work will be needed - which is where profiling comes in. You may not be able to use a profiler very well (for example, if it impacts the behaviour too much), in which case instrumentation should be used instead.

MarkR
+4  A: 

First of all do not forget these :

  • Premature optimisation is root of all evils
  • Performance come from the design

Secondly;

Don't assume try and see

I think this is the cardinal rule of optimisation, test it, unless you test it and prove that it's working you wouldn't know.

In your case what I'd do is, firstly refactor the code, have grasp of it.

If you got unit tests you are lucky just go function by function and specifically go and check most frequently called code (use profiling to observe calls and where are the bottlenecks). If you don't have tests, write some basic tests which confirms the overall output in some conditions so you can ensure that you are not breaking anything and free to experiment.

dr. evil
A: 

Assuming the code needs optimization, then i would use : 1.) Optimize the way buffers are handled used - Cache optimization. Cache latencies are one big area which suck CPU cycles like bad. roughly in range of 5-10 % overhead So make use of data buffers in cache efficient manner.

2.) Critical loops and MCPS intensive functions can be coded in assembly language or using the low level intrinsics provided by the compiler for that h/w target.

3.) Read /writes from external memory are cycle expensive. minimize external memory access as much as possible. Or if u have to access do it in a efficient way(PReloading data, PArallel DMA access etc..)

Generally if u get around 20% optimization(best case) by following optimization techniques, I would say thats good enough and best enough without any major code restructure, algorithm redesign. After which it becomes trick, complicated, and tedious.

-AD

goldenmean
A: 

The language I have done most optimisation in for numerical computation in C and C++ on Linux. I found that profiling, while useful, can skew your run time results so that cheap, frequently called operations (like c++ iterator increments). So take those with a grain of salt. In terms of actual strategies that produced a good speed up are:

  1. Use numerical arrays rather than arrays of objects. For example, c++ has a "Complex" datatype. Operations on an array of these were a lot slower than a similar operation on two arrays of floats. This can be generalised to "use the machine types" for any performance critical code.
  2. Write the code to allow the compiler to be more effective in its optimisations. For example, if you have a fixed size array, use array indices so that auto vectorisation (a feature of Intel's compiler) can work.
  3. SIMD instructions can provide a good speedup if your problem fits into the kind of domain that they are designed for (multiplying/dividing floats or ints all at the same time). This is stuff like MMX, SSE, SSE2 etc.
  4. Using lookup tables with interpolation for expensive functions where exact values are not important. This is not always good, as looking up the data in memory might be expensive in its own right.

I hope that gives you some inspiration!

Greg Reynolds
A: 

All good answers.

I would refine the "measure" part of the advice. I perform measurement for the purpose of quantifying any improvements I might make. However, for finding what needs to be fixed (and that is a different purpose), I get several samples of the call stack, manually.

Suppose, for simplicity, the program takes 20 giga-cycles to run, when it should take 10. If I pause it 10 times at random, then in 5 of those occasions, more or less, it will be in one of those unnecessary cycles. I can see if the cycle is necessary by looking at each layer of the call stack. If any call instruction at any level of the stack could be eliminated, then the cycle is unnecessary. If such an instruction appears on multiple stacks, eliminating it will speed up the program, by an amount which is approximately the percentage of stack samples it is on.

Any instruction that shows up on multiple stacks is suspect - the more stacks, the more suspect. Now, call _main is probably not one I can remove, but
foo.cpp:96 call std::vector::iterator:++
if it shows up on multiple stacks, is definitely a focus of attention.

One may, for style reasons, not want to replace it, but one will know roughly what price in performance is being paid for that choice.

So optimizing consists of identifying suspects and finding a way to replace or eliminate them.

The hard part (and I've done this many times) is understanding what is necessary and what is not. For that, you're going to absorb an understanding of how and why things are done in that program, so if some activity is a cycle-hog, you can know how to safely replace it.

Some cycle-hogs may be simple to fix, but you will quickly run up against ones that you don't know how to safely replace. For that, you need to be more familiar with the code.

It will help if you can pick the brain of someone who's already worked on the program.

Otherwise (assuming the code is ~ 10^6 lines, like I've worked on) you can get some speedup fairly easily, but to go beyond that, it can take months to get to where you're comfortable making significant changes.

Mike Dunlavey
A: 

As many others have already stated, profiling is your first step.

I would add that focusing on data structures and algorithms as a first step is generally more profitable than diving straight into micro-optimizations.

For one thing, the compiler will normally perform a lot of the classic optimization tricks for you anyway (and often better than you). This is especially true in more modern languages such as C# than it is in older, less constrained, languages C as the compiler has more knowledge of program structure; worse still, confusing your code by "optimizing" it can actually make it harder for the compiler to apply its own, more efficient, optimizations.

Mostly though, there is just a lot more scope to improve things when you start improving the big-oh of your operations. For example, searching a linked list is O(n); meaning the time taken to search it increases at the same rate as the size of the data stored in it. Searching a hashtable is only O(1), so you can add more data without increasing the time to search it (there are other factors when we leave the theoretical world so this isn't quite true, but it is true enough most of the time).

Messing with your loops so they run from high to low so the generated code can save a couple of clock cycles with a JumpIfZero rather than a JumpIfLessThan is probably not going to have quite the same degree of impact!

Richard
A: 

Good strategies

Besides the mentioned basic laws of optimization (measure, don't optimize if not necessary), and although the question explicitly asked for efficiency, I always refactor such code during my inspection.

Typically code with bad performance is also badly documented. So with the refactoring I make sure the code is better documented by itself and easier to understand. That's the base for being sure I know what I'm optimizing (as in most cases the requirements for that piece of code will also not be completely available).

When to stop

With a really bad performing application, you will typically have a spike in the runtime shown for a single method (or set of related methods) in your profiler, showing a programming bug or design flaw. So I typically stop if the runtime of profiled methods is distributed mostly equally (or if the majority of bottleneck methods shown is platform code, like Sun Java methods). If further optimization is demanded by your customers you will have to redesign larger parts of the application instead of optimizing existing code.

Bananeweizen