views:

346

answers:

9

I've been trying to get a good hold on the x86 assembly language, and was wondering if there was a quick-and-short equivalent of movl $1, %eax. That's when I thought that a list of idioms used frequently in the language would perhaps be a good idea.

This could include the preferred use of xorl %eax, %eax as opposed to movl $0, %eax, or testl %eax, %eax against cmpl $0, %eax.

Oh, and kindly post one example per post!

+4  A: 

on x64:

xor eax, eax 

for

xor rax, rax

(the first one also implicitly clears the upper half of rax, but has a smaller opcode)

PhiS
+5  A: 

Using LEA for e.g. multiplication, like:

lea eax, [ecx+ecx*4]   

for EAX = 5 * ECX

PhiS
BTW: this is dog slow on NetBurst, because Intel removed the barrel-shifter in order to be able to obtain higher clock speeds. Ironically, at the time the P4 came out, this was still documented in Intel's optimization manuals.
Jörg W Mittag
Thanks for the comment re. speed. I realise that an idiom is not necessarily the same thing as an optimisation. However, as an idiom, I think LEA has been fairly widely (ab)used.
PhiS
Well, it *is* an optimization. And it is even officially recommended by Intel. It's just that, after officially recommending it for 15 years, they suddenly releases a new CPU on which it was slow, thus essentially requiring recompiling *every single program ever written*. Thankfully, NetBurst died a quick and painful death and all current microarchitectures are evolutions of the Pentium III, not the Pentium4, so all current CPUs again have a barrel shifter. Basically, *all* Intel CPUs since 80385 and all Athlons have it, only the Pentium4 doesn't.
Jörg W Mittag
+4  A: 

You might as well as how to optimize in assembly. Then you'd have to ask what you're optimizing for: size or speed? Anyway, here's my "idiom", a replacement for xchg:

xor eax, ebx
xor ebx, eax
xor eax, ebx
Sparafusile
**WARNING:** If eax == ebx - Both will be zeroed!
LiraNuna
Are you sure about that? 42 ^ 42 = 0 ; 42 ^ 0 = 42 ; 0 ^ 42 = 42
Sparafusile
+1  A: 

Using SHL and SHR for multiplication/division by a power of 2

PhiS
+1  A: 

Another one (beside xor) for

mov eax, 0   ; B800000000h

is

sub eax, eax ; 29C0h

Rationale: smaller opcode

PhiS
+1  A: 

Don't know whether this counts as an idiom, but on most processors prior to i7

movq xmm0, [eax]
movhps xmm0, [eax+8]

or, if SSE3 is available,

lddqu xmm0, [eax]

are faster for reading from an unaligned memory location than

movdqu xmm0, [eax]
PhiS
+3  A: 

Expanding on my comment:

To an undiscerning processor such as the Pentium Pro, xorl %eax, %eax appears to have a dependency on %eax and thus must wait for the value of that register to be available. Later processors actually have additional logic to recognize that instruction as not having any dependencies.

The instructions incl and decl set some of the flags but leave others unchanged. That's the worst situation if the flags are modelized as a single register for the purpose of instruction reordering: any instruction that reads a flag after an incl or decl must be considered as depending on the incl or decl (in case it's reading one of the flags that this instruction sets) and also on the previous instruction that set the flags (in case it's reading one of the flags that this instruction does not set). A solution would be to divide the flags register into two and to consider dependencies with this finer grain... but AMD had a better idea and removed these instructions entirely from the 64-bit extension they proposed a few years back.

Regarding the links, I found this either in the Intel manuals for which it's useless to provide a link because they are on a corporate website that's reorganized every six months, or on Agner Fog's site: http://www.agner.org/optimize/#manuals

Pascal Cuoq
+3  A: 

At loops...

  dec     ecx 
  cmp     ecx, -1       
  jnz     Loop              

is

  dec     ecx  
  jns     Loop 

Faster and shorter.

GJ
+3  A: 

Here's another interesting "idiom". Hopefully everyone knows that division is a big time sink even compared to a multiplication. Using a little math, it's possible to multiply by the inverse of constant instead of dividing by it. This goes beyond the shr tricks. For example, to divide by 5:

mov eax, some_number
mov ebx, 3435973837    // 32-bit inverse of 5
mul ebx

Now eax has been divided by 5 without using the slow div opcode. Here is a list of useful constants for division shameless stolen from http://blogs.msdn.com/devdev/archive/2005/12/12/502980.aspx

3   2863311531
5   3435973837
7   3067833783
9   954437177
11  3123612579
13  3303820997
15  4008636143
17  4042322161

For numbers not on the list, you might need to do a shift beforehand (to divide by 6, shr 1, then multiply by the inverse of 3).

Sparafusile