views:

296

answers:

10

I've done a lot of relatively simple numerical programming, and a large amount of programming in general. In that time, I've found that a lot of people tend to perform what I call optimization by suspicion, or optimization by hearsay. Some of these really are optimizations (and indeed, some of them make a significant difference in actual speed of execution), and some are not. I'm trying to compile a list of these, with the end goal of profiling them to see whether they really make a difference with modern compilers and how much of one they make.

Some examples might be:

  • x++ versus ++x
  • If you have to calculate 1/2*a*b*c/(d*e)*x^2) in a loop, where a, b, c, d, e don't vary inside the loop, precalculating the coefficient will be faster
  • In Fortran, arrays are laid out so that the last index varies slowest, so we nest loops in k, j, i order, rather than i, j, k order to increase cache usage
  • Avoiding if statements inside inner loops
+1  A: 

Two that I've seen:

  • Use the smallest data type possible. (Float in place of double, short in place of int.)
  • Use arrays instead of Lists.
Bill the Lizard
I've heard that using shorts instead of ints can actually be slower. The native data type of your processor is an int, and is therefore quicker to work with, and check for overflow. With shorts, the processor has to do extra work.
Kibbee
This does make sense if the data type reflects one in a database. Short/Byte will use less space in the DB - and if you know in advance that the value can't exceed the data space, then you really SHOULD use the smaller data type.
Alex
@Alex: That's a good point. It *does* make sense when you're storing the data in a database.
Bill the Lizard
It can also make a difference if you're processing e.g. an image in memory. Pixels rarely have 32 bits colordepth, and many photos are roughly the size of an L3 cache. Same root problem though: `short` is a bad idea, `short[]` isn't.
MSalters
A: 
  • A lot of programmers at my company will very readily switch to pointer-math (C++) instead of using references, array index operators etc.
  • Using const pointers/references in function arguments for small data types, where a copy might have been just as fast.
David Rutten
+2  A: 

My favourite one is not placing method calls in the conditions of for-loops.

Example (in some language with loops):

for (int i=0; i<a.size(); ++i) {
   //Do something
}

I remember reading about the Underhanded Competition for 2006 where the goal was to write perfectly fine code that ran at extremely different speeds on different platforms. The winner had essentially written a for loop with a strlen call in the condition, resulting in a 100x degradation in speed on Windows using Cygwin's GCC (or something like that). Basically, if it's going to get evaluated every time and the value won't change... don't trust the compiler to pull it out; move it yourself.

Malaxeur
This is a superstition of mine as well. If the method is const, the compiler should do the optimization, but I often do `for (iterator i=v.begin(),e=v.end();i!=e;++i)` based on paranoia.
wrang-wrang
wrang, const is a compile-time check, not an optimization hint. `const` has to many loopholes to be considered by the optimizer (mutable, aliasing, ...)
peterchen
Actually, it's a fine interaction between language rules, library developers and the optimizer. It may help if all play together. The optimizer will see different instructions if the library writer provided `const` overloads.
MSalters
@peterchen: You're right about aliasing messing with the optimizer. That's why C99 has the "strict aliasing" rule :-). See http://en.wikipedia.org/wiki/Aliasing_%28computing%29#Conflicts_with_optimization
sleske
+1  A: 

Inlining functions/methods prematurely either by a compiler hint or copy-and-pasting code.

carl
+1  A: 

It's tricky, because some obvious optimizations aren't done without programmer intervention, though things are getting better all the time.

x++ on user-defined classes in C++ may be slower. For primitives, not so much.

C++ compilers routinely fail to fold multiple constant additions, divisions, etc. depending on the type.

Multi-dimensional array scan order will matter if the whole array is too large to fit in L1 cache.

wrang-wrang
A: 

One very common:

  • avoid creating function as function call is supposed to be very costly

One that was common, (but hopefully not any more as it raise many other issues and became pointless with OO programming):

  • use globals instead of passing constant parameters to functions

Two whose I suspect achieve the exact opposite of the goal (makes compiler task harder and achieve less optimisation):

  • reuse existing variables
  • avoid creating local variables in loop blocs

Some I use sometimes (as I had proof in the past compilers does not manage that well with sometimes huge performance benefits... is it still so ?)

  • use static to avoid exporting local functions outside module
  • bypass linker by direct inclusion of source file (#include "source.c").

Some old classics:

  • implicit comparisons with 0 or NULL (while (*x++) and such)
kriss
"use static to avoid exporting local functions outside module"This is good practice anyway. If I could change two things about C, the first would be to abolish bitfields (or at least make their meaning compiler independent) and the second would be to make the default scoping of functions static, so that to get a function exported you would have to do so explicitly.
smcameron
A: 

x++ versus ++x isn't an optimization issue, it's a clarity of intent issue. Do you actually want to use the old value of x? (Hint: usually no.) If you do, go ahead and use x++, but otherwise, indicate to the maintenance programmer who looks at it in two years that this is just an increment, nothing more.

Just as you wouldn't declare something without const if it semantically should never change, you should use your pre/post increments to show your intent.

Bill
It *was* an optimization when used a a standalone statement (and still is on mediocre compilers).
peterchen
But if it's a standalone statement then you have no use for the previous value of x, and should write ++x because it's semantically correct.
Bill
I see your point, but it is often cited as an optimization, at least by newbie programmers.
notJim
A: 

In for loops don't declare variables.

for (int t = 0; t < 5; t++)
  String s = "Number" + t;
};

I have found this to be better, in some languages, as it seems to reduce some problems:

String s;
for (int t = 0; t < 5; t++) {
  s = "Number" + t;
}
James Black
In what language does it reduce what problem?
Alex
Btw, in your 1st example the object constructor is called as many times as the iterations of your for loop, whereas in your second one its only called once. That's surely faster and advised in general unless your specific case warrants it.
Alex
At one point in Java, I believe pre-JDK5, I had a memory leak when I did the first one, as it appeared that the JDK would lose reference to these strings, so it wouldn't ever garbage collect them. By doing the second the problem went away.
James Black
Actually, often assignment is more expensive than construction because you need to deal with stuff like self-assignment. Thus, 1 ctor+N assignments is more expensive than N constructors.
MSalters
+12  A: 

The bad name for many premature optimizations comes from being for a different age.

Then:

  • Memory is scarce, but access is cheap
  • Instructions are expensive
  • Optimizers are stupid, and are easily outdone manually. Your compiler needs a lot of "hints"
  • (at least for for the x86 platform): Multiplication is slow, division triply so, and floating point may take ages. Source to machine code was rather straightforward, and CPU's would more or less execute that step by step.

Now:

  • There's more than enough memory for most applications, but access is strongly nonlinear (hotspot access is still cheap)
  • Instructions are very cheap
  • Optimizers are more clever than most of us. They even understand the C++ standard better than many of us do
  • (x86) We've moved from a CISC architecture to a RISCified arhitecture executing RISCified CISC code. And only Agner Fog and your optimizer know what that actually means.

One thing to note is when moving from desktop to embedded, you get a general technology throwback of ten years.


So here's a list of things I either leave to my compiler, or don't bother anymore

  • replacing multiplication by constant by shift-and-add (e.g. x*5 == x+(x<<2))
  • replacing integer division by constant by overflow-multiplication
  • precalculating constants
  • lookup tables, e.g. for fast but less accurate sine calculation.
  • XOR fpor integer swap
  • hoisting simple constant evaluations out of a loop
  • unrolling short constant-length loops


Premature optimizations aren't bad as such. Otherwise, we'd all have to use Bubble sort until proven that the sort is a bottleneck :-) Knuth, in his original quote, critizised focusing on the small things rather than dealing with performance on a higher level - design, data access patterns and algorithms.

Early optimization (or "optimization by suspicion", as you say, I like that term) is evil only if it's not proven to be effective and degrades another aspect the code (such as readability, simplicity or flexibility).

Also, library code gets some leeway. You don't know how often your new UnfuddleString functions gets called per end-user action, after all.

peterchen
+1 for library code getting leeway. I avoid premature optimization like the plague in single-purpose code but in library code I'm always tempted to automatically make things as efficient as possible so that if some day I try to use the lib with a huge dataset it "just works".
dsimcha
A: 

There's a tendency to focus on little stuff, and be judgemental about it, while missing the big picture. Much of the way big software is designed has an implicit rationale of performance and therefore, in my opinion, constitutes premature optimization. Examples:

  • Collections with hash-coding. Hash-coding may have O(1) behavior (a performance rationale), but for reasonable-size collections may be way overkill.

  • Notification-style programming. It is hard to keep data fully denormalized, so programmers need to have a way to deal with denormalized structure (which has inconsistent states). The method that is usually used is to have "properties" with getters and setters, which are encouraged to propogate changes instantaneously throughout the data structure, trying to keep it consistent. The rationale is that this has better performance, compared to a strategy of tolerating a certain amount of inconsistency and dealing with it separately. When performance tuning is done, it is seen that these attempts to maintain consistency result in massive extra work being done in response to small changes.

  • Too many classes and too much data structure. Often the rationale for having a class and creating objects is that it was a concept in the head of the programmer, when the same information could possibly have been easily extracted from other data by some sort of O(N) sweep. When people think O(N) they think poor performance (horrors). Thus, performance becomes an implicit premature justification for the extra data structure.

  • Iterators. People are encouraged to used standard collection classes with iterators, and are told that the compiler will do a good job of optimizing the iterators (i.e. they will have good performance). Then when they are used in low-level loops guess what? Step through the code at the assembly level and they are going through all kinds of generality that no compiler could ever make sense of. In inner loops without function calls, be suspicious of iterators.

In the work I've done in performance tuning, I often find myself having to rip out a rat's nest of excess data structure causing massive performance problems. And the reason most often given for the excess data structure is "performance".

Here's a small example.

ADDED: There are other false-anticipation issues that can cause performance problems, such as over-eager globalization. Unnecessarily extracting strings from resources at app startup is not cheap.

Mike Dunlavey