ansaurus

Question

Help improving a simple assembly function

Answer 1

+9 A:

    mov eax, number  //move number in eax
    mov ecx, eax     //dup in ecx
    mul ecx          //multiply (n*n)
    jo end           //jump to end if overflow
    add eax, ecx     //addition (n*n+n); can't overflow
    add ecx, ecx     //addition (2n); can't overflow
    add ecx, 1       //addition (2n+1); can't overflow
    mul ecx          //multiply (n*n+n)(2n+1)
    jo end           //jump to end if overflow
    mov ecx, 6       //move 6 in ebx
    div ecx          //divide by 6, the result will be in eax

    mov result, eax //move eax in result

Strength reduction: add instead of multiply.

By analysis, fewer overflow checks (you can do better as you described).

Keep values in registers instead of going back to the argument on the stack.

Chose registers carefully so values that can be reused are not overwritten.

Doug Currie 2010-04-16 00:04:20

Add can't overflow? Am I reading this right?

MPelletier 2010-04-16 00:36:10

Can't overflow because n*n didn't, so n*n+n won't. The largest n such that n*n won't overflow is 0xffff. 0xffff * 0xffff + 0xffff = 0xffff0000. Since at that point n <= 0xffff, 2n+1 is at most 0x1ffff, again no overflow.

Doug Currie 2010-04-16 01:14:12

I'd vote you twice right now if I could. That's solid math!

MPelletier 2010-04-16 02:10:32

Answer 2

+3 A:

mov eax, number    ; = n
cmp eax, 0x928     ; cannot handle n >= 0x928
jnc end 
shl eax, 1         ; = n(2)
add eax, 3         ; = n(2)+3
mov ebx, number
mul ebx            ; = n(n(2)+3)
add eax, 1         ; = n(n(2)+3)+1
mov ebx, number
mul ebx            ; = n(n(n(2)+3)+1) = n(n+1)(2n+1)
mov ebx, 6
div ebx
mov result, eax

Rather than checking for overflow, this solution checks the input against the known maximum value that the function can handle. Note that the last multiplication is allowed to overflow, and it will overflow for any input number greater than 0x509. Checking against a known value rather than relying on overflow checks allows the function to handle almost twice as many input values. In fact, the function is able to handle every input whose result fits within 32 bits.

Matthew T. Staebler 2010-04-16 04:05:41

Answer 3

+3 A:

UInt32 sumSquares(const UInt32 number)
{
  __asm
  {
    mov eax, number     // n
    cmd eax, MAX_VALUE
    jg  bad_value

    lea ebx, [eax+1]    // n + 1
    lea ecx, [2*eax+1]  // 2n + 1

    mul ebx
    mul ecx

    shr eax, 1          // divide by 2
    mov ebx, 2863311531 // inverse of 3
    mul ebx             // divide by 3

    ret

    bad_value:
    xor eax, eax        // return 0
    ret
  }
}

http://blogs.msdn.com/devdev/archive/2005/12/12/502980.aspx

Spara

Sparafusile 2010-04-16 11:43:19

The lea tricks are interesting to talk about; I believe they can be slower than the straightforward arithmetic on newer processors. The multiply by inverse is a good technique that I did not have the energy to describe yesterday; I think your result is in edx, though, so you need to move it to eax.

Doug Currie 2010-04-16 13:49:28

@Doug Currie I'm not sure what's making you think the result is in EDX. All operations are done on EAX and the high order bits of the multiply are in EDX (which are discarded). Am I missing something?Interesting arithmetic tricks are certainly possible, but you did a good job of that earlier so I opted for a more straight-forward example.

Sparafusile 2010-04-16 14:15:27

I guessed that you were multiplying by 1431655765 (2^32/3). Reading closer I see that you are not. The multiply by inverse you are using only works when the dividend is an exact multiple of the divisor. Amazingly this is the case for this algorithm. +1

Doug Currie 2010-04-16 20:01:44

ansaurus

tags:

views:

answers:

Help improving a simple assembly function

related questions