ansaurus

Question

common pre-mature optimizations and micro-optimizations

Answer 1

+1 A:

Two that I've seen:

Use the smallest data type possible. (Float in place of double, short in place of int.)
Use arrays instead of Lists.

Bill the Lizard 2009-09-19 01:33:55

I've heard that using shorts instead of ints can actually be slower. The native data type of your processor is an int, and is therefore quicker to work with, and check for overflow. With shorts, the processor has to do extra work.

Kibbee 2009-09-19 02:17:03

This does make sense if the data type reflects one in a database. Short/Byte will use less space in the DB - and if you know in advance that the value can't exceed the data space, then you really SHOULD use the smaller data type.

Alex 2009-09-19 09:14:35

@Alex: That's a good point. It *does* make sense when you're storing the data in a database.

Bill the Lizard 2009-09-19 15:02:01

It can also make a difference if you're processing e.g. an image in memory. Pixels rarely have 32 bits colordepth, and many photos are roughly the size of an L3 cache. Same root problem though: `short` is a bad idea, `short[]` isn't.

MSalters 2009-09-21 13:20:59

Answer 2

A:

A lot of programmers at my company will very readily switch to pointer-math (C++) instead of using references, array index operators etc.
Using const pointers/references in function arguments for small data types, where a copy might have been just as fast.

David Rutten 2009-09-19 01:35:53

Answer 3

+2 A:

My favourite one is not placing method calls in the conditions of for-loops.

Example (in some language with loops):

for (int i=0; i<a.size(); ++i) {
   //Do something
}

I remember reading about the Underhanded Competition for 2006 where the goal was to write perfectly fine code that ran at extremely different speeds on different platforms. The winner had essentially written a for loop with a strlen call in the condition, resulting in a 100x degradation in speed on Windows using Cygwin's GCC (or something like that). Basically, if it's going to get evaluated every time and the value won't change... don't trust the compiler to pull it out; move it yourself.

Malaxeur 2009-09-19 01:39:03

This is a superstition of mine as well. If the method is const, the compiler should do the optimization, but I often do `for (iterator i=v.begin(),e=v.end();i!=e;++i)` based on paranoia.

wrang-wrang 2009-09-19 01:41:07

wrang, const is a compile-time check, not an optimization hint. `const` has to many loopholes to be considered by the optimizer (mutable, aliasing, ...)

peterchen 2009-09-19 08:32:26

Actually, it's a fine interaction between language rules, library developers and the optimizer. It may help if all play together. The optimizer will see different instructions if the library writer provided `const` overloads.

MSalters 2009-09-21 13:16:42

@peterchen: You're right about aliasing messing with the optimizer. That's why C99 has the "strict aliasing" rule :-). See http://en.wikipedia.org/wiki/Aliasing_%28computing%29#Conflicts_with_optimization

sleske 2010-03-03 11:33:38

Answer 4

+1 A:

Inlining functions/methods prematurely either by a compiler hint or copy-and-pasting code.

carl 2009-09-19 01:39:05

Answer 5

+1 A:

It's tricky, because some obvious optimizations aren't done without programmer intervention, though things are getting better all the time.

x++ on user-defined classes in C++ may be slower. For primitives, not so much.

C++ compilers routinely fail to fold multiple constant additions, divisions, etc. depending on the type.

Multi-dimensional array scan order will matter if the whole array is too large to fit in L1 cache.

wrang-wrang 2009-09-19 01:39:45

Answer 6

A:

One very common:

avoid creating function as function call is supposed to be very costly

One that was common, (but hopefully not any more as it raise many other issues and became pointless with OO programming):

use globals instead of passing constant parameters to functions

Two whose I suspect achieve the exact opposite of the goal (makes compiler task harder and achieve less optimisation):

reuse existing variables
avoid creating local variables in loop blocs

Some I use sometimes (as I had proof in the past compilers does not manage that well with sometimes huge performance benefits... is it still so ?)

use static to avoid exporting local functions outside module
bypass linker by direct inclusion of source file (#include "source.c").

Some old classics:

implicit comparisons with 0 or NULL (while (*x++) and such)

kriss 2009-09-19 01:45:48

"use static to avoid exporting local functions outside module"This is good practice anyway. If I could change two things about C, the first would be to abolish bitfields (or at least make their meaning compiler independent) and the second would be to make the default scoping of functions static, so that to get a function exported you would have to do so explicitly.

smcameron 2009-09-19 02:39:55

Answer 7

A:

x++ versus ++x isn't an optimization issue, it's a clarity of intent issue. Do you actually want to use the old value of x? (Hint: usually no.) If you do, go ahead and use x++, but otherwise, indicate to the maintenance programmer who looks at it in two years that this is just an increment, nothing more.

Just as you wouldn't declare something without const if it semantically should never change, you should use your pre/post increments to show your intent.

Bill 2009-09-19 02:03:14

It *was* an optimization when used a a standalone statement (and still is on mediocre compilers).

peterchen 2009-09-19 08:34:10

But if it's a standalone statement then you have no use for the previous value of x, and should write ++x because it's semantically correct.

Bill 2009-09-19 18:40:45

I see your point, but it is often cited as an optimization, at least by newbie programmers.

notJim 2009-09-19 20:38:39

Answer 8

A:

In for loops don't declare variables.

for (int t = 0; t < 5; t++)
  String s = "Number" + t;
};

I have found this to be better, in some languages, as it seems to reduce some problems:

String s;
for (int t = 0; t < 5; t++) {
  s = "Number" + t;
}

James Black 2009-09-19 02:34:08

In what language does it reduce what problem?

Alex 2009-09-19 09:16:29

Btw, in your 1st example the object constructor is called as many times as the iterations of your for loop, whereas in your second one its only called once. That's surely faster and advised in general unless your specific case warrants it.

Alex 2009-09-19 09:18:06

At one point in Java, I believe pre-JDK5, I had a memory leak when I did the first one, as it appeared that the JDK would lose reference to these strings, so it wouldn't ever garbage collect them. By doing the second the problem went away.

James Black 2009-09-19 12:01:28

Actually, often assignment is more expensive than construction because you need to deal with stuff like self-assignment. Thus, 1 ctor+N assignments is more expensive than N constructors.

MSalters 2009-09-21 13:17:55

Answer 9

+12 A:

The bad name for many premature optimizations comes from being for a different age.

Then:

Memory is scarce, but access is cheap
Instructions are expensive
Optimizers are stupid, and are easily outdone manually. Your compiler needs a lot of "hints"
(at least for for the x86 platform): Multiplication is slow, division triply so, and floating point may take ages. Source to machine code was rather straightforward, and CPU's would more or less execute that step by step.

Now:

There's more than enough memory for most applications, but access is strongly nonlinear (hotspot access is still cheap)
Instructions are very cheap
Optimizers are more clever than most of us. They even understand the C++ standard better than many of us do
(x86) We've moved from a CISC architecture to a RISCified arhitecture executing RISCified CISC code. And only Agner Fog and your optimizer know what that actually means.

One thing to note is when moving from desktop to embedded, you get a general technology throwback of ten years.

So here's a list of things I either leave to my compiler, or don't bother anymore

replacing multiplication by constant by shift-and-add (e.g. x*5 == x+(x<<2))
replacing integer division by constant by overflow-multiplication
precalculating constants
lookup tables, e.g. for fast but less accurate sine calculation.
XOR fpor integer swap
hoisting simple constant evaluations out of a loop
unrolling short constant-length loops

Premature optimizations aren't bad as such. Otherwise, we'd all have to use Bubble sort until proven that the sort is a bottleneck :-) Knuth, in his original quote, critizised focusing on the small things rather than dealing with performance on a higher level - design, data access patterns and algorithms.

Early optimization (or "optimization by suspicion", as you say, I like that term) is evil only if it's not proven to be effective and degrades another aspect the code (such as readability, simplicity or flexibility).

Also, library code gets some leeway. You don't know how often your new UnfuddleString functions gets called per end-user action, after all.

peterchen 2009-09-19 09:11:53

+1 for library code getting leeway. I avoid premature optimization like the plague in single-purpose code but in library code I'm always tempted to automatically make things as efficient as possible so that if some day I try to use the lib with a huge dataset it "just works".

dsimcha 2010-02-10 20:35:43

Answer 10

A:

There's a tendency to focus on little stuff, and be judgemental about it, while missing the big picture. Much of the way big software is designed has an implicit rationale of performance and therefore, in my opinion, constitutes premature optimization. Examples:

Collections with hash-coding. Hash-coding may have O(1) behavior (a performance rationale), but for reasonable-size collections may be way overkill.
Notification-style programming. It is hard to keep data fully denormalized, so programmers need to have a way to deal with denormalized structure (which has inconsistent states). The method that is usually used is to have "properties" with getters and setters, which are encouraged to propogate changes instantaneously throughout the data structure, trying to keep it consistent. The rationale is that this has better performance, compared to a strategy of tolerating a certain amount of inconsistency and dealing with it separately. When performance tuning is done, it is seen that these attempts to maintain consistency result in massive extra work being done in response to small changes.
Too many classes and too much data structure. Often the rationale for having a class and creating objects is that it was a concept in the head of the programmer, when the same information could possibly have been easily extracted from other data by some sort of O(N) sweep. When people think O(N) they think poor performance (horrors). Thus, performance becomes an implicit premature justification for the extra data structure.
Iterators. People are encouraged to used standard collection classes with iterators, and are told that the compiler will do a good job of optimizing the iterators (i.e. they will have good performance). Then when they are used in low-level loops guess what? Step through the code at the assembly level and they are going through all kinds of generality that no compiler could ever make sense of. In inner loops without function calls, be suspicious of iterators.

In the work I've done in performance tuning, I often find myself having to rip out a rat's nest of excess data structure causing massive performance problems. And the reason most often given for the excess data structure is "performance".

Here's a small example.

ADDED: There are other false-anticipation issues that can cause performance problems, such as over-eager globalization. Unnecessarily extracting strings from resources at app startup is not cheap.

Mike Dunlavey 2009-09-21 13:11:13

ansaurus

tags:

views:

answers:

common pre-mature optimizations and micro-optimizations

related questions