views:

121

answers:

5

I'm working on some code which contains some (compiler generated) chunks of assembly code that we've identified are speed bottle necks.

I know enough about assembly to muddle through and look for manual optimizations - I'm wondering, though, if there are any good, online guides that offer reusable techniques to be used in hand-optimizing assembly. This is not something I expect I'll have to do very often, so odds are I'll have to learn how to do it, again, from scratch, each time.

+4  A: 

http://www.agner.org/optimize/

http://www.intel.com/intelpress/sum_swcb2.htm - have to buy, and stresses intrinsics rather than assembly

aaa
+5  A: 

http://www.agner.org/optimize/optimizing_assembly.pdf

I'd say "have fun", but it would probably be really mean-spirited :(

I think you're interested in Chapter 9, "Optimizing for Speed".

David Titarenco
That looks like a great book. I'd also recommend Chapter 13 on vector programming.
Karl Bielefeldt
A: 

While this might not need saying...

In general, you'll go a lot further by helping the compiler (I'm using GCC as an example, but this should be relevant for other compilers too):

  • Play with compiler options for a while (-march=native, -mfpmath=sse, -msse3, -marm, -mthumb)
  • Use profiling information when you can (-fprofile-generate, -fprofile-use)
  • Tweak the algorithm to see what produces "better" code ((x>>8)&0xFF or (x&0xFF00)>>8? It's one instruction on PPC, but the compiler might use two)
  • Tweak your algorithm so it uses the cache better.
  • Use vector extensions if your compiler supports them. Your compiler may have additional target-specific builtins (x86, ARM NEON).
  • Use a better compiler (RVCT for ARM, ICC for x86)

I'd be surprised if you could get more than a 20% speed-up over a decent C compiler, unless there are specific instructions/features which the compiler isn't using. And 20% is rarely worth writing home about unless it's all your app does.

tc.
+1 for a good answer and to cancel out the unwarranted downvote - this is a very important point - it's pretty hard to beat a *good* compiler when it comes to optimisation on modern CPUs
Paul R
A: 

I agree with the previous answers suggesting Agner Fog's optimisation manuals. They are really great.

In addition, however, Intel and AMD also provide some freely available optimisation manuals, e.g. the following may be of interest to you:

Intel 64 and IA-32 Architectures Optimization Reference Manual

Software Optimization Guide for AMD Family 10h Processors

PhiS
A: 

Agner Fog's site seems to be a common response. Another page that I have found particularly useful over the years has been Paul Hsieh's page at ...

http://www.azillionmonkeys.com/qed/tech.shtml

Sparky