tags:

views:

128

answers:

3

I just handed in this function in an assignment. It is done (hence no homework tag). But I would like to see how this can be improved.

Essentially, the function sums the squares of all the integers between 1 and the given number, using the following formula:

n(n+1)(2n+1)/6

Where n is the maximum number.

The function below is made to catch any overflow and return 0 should any occur.

UInt32 sumSquares(const UInt32 number)
{
    int result = 0;
    __asm
    {
        mov eax, number  //move number in eax
        mov edx, 2       //move 2 in edx
        mul edx          //multiply (2n)
        jo end           //jump to end if overflow
        add eax, 1       //addition (2n+1)
        jo end           //jump to end if overflow
        mov ecx, eax     //move (2n+1) in ecx

        mov ebx, number  //move number in ebx
        add ebx, 1       //addition (n+1)
        jo end           //jump to end if overflow

        mov eax, number //move number in eax for multiplication
        mul ebx         //multiply n(n+1)
        jo end          //jump to end if overflow
        mul ecx         //multiply n(n+1)(2n+1)
        jo end          //jump to end if overflow
        mov ebx, 6      //move 6 in ebx
        div ebx         //divide by 6, the result will be in eax

        mov result, eax //move eax in result

end:
    }

    return result;
}

Basically, I want to know what I can improve in there. In terms of best-practices mostly. One thing sounds obvious: smarter overflow check (with a single check for whatever maximum input would cause an overflow).

+9  A: 
    mov eax, number  //move number in eax
    mov ecx, eax     //dup in ecx
    mul ecx          //multiply (n*n)
    jo end           //jump to end if overflow
    add eax, ecx     //addition (n*n+n); can't overflow
    add ecx, ecx     //addition (2n); can't overflow
    add ecx, 1       //addition (2n+1); can't overflow
    mul ecx          //multiply (n*n+n)(2n+1)
    jo end           //jump to end if overflow
    mov ecx, 6       //move 6 in ebx
    div ecx          //divide by 6, the result will be in eax

    mov result, eax //move eax in result

Strength reduction: add instead of multiply.

By analysis, fewer overflow checks (you can do better as you described).

Keep values in registers instead of going back to the argument on the stack.

Chose registers carefully so values that can be reused are not overwritten.

Doug Currie
Add can't overflow? Am I reading this right?
MPelletier
Can't overflow because n*n didn't, so n*n+n won't. The largest n such that n*n won't overflow is 0xffff. 0xffff * 0xffff + 0xffff = 0xffff0000. Since at that point n <= 0xffff, 2n+1 is at most 0x1ffff, again no overflow.
Doug Currie
I'd vote you twice right now if I could. That's solid math!
MPelletier
+3  A: 
mov eax, number    ; = n
cmp eax, 0x928     ; cannot handle n >= 0x928
jnc end 
shl eax, 1         ; = n(2)
add eax, 3         ; = n(2)+3
mov ebx, number
mul ebx            ; = n(n(2)+3)
add eax, 1         ; = n(n(2)+3)+1
mov ebx, number
mul ebx            ; = n(n(n(2)+3)+1) = n(n+1)(2n+1)
mov ebx, 6
div ebx
mov result, eax

Rather than checking for overflow, this solution checks the input against the known maximum value that the function can handle. Note that the last multiplication is allowed to overflow, and it will overflow for any input number greater than 0x509. Checking against a known value rather than relying on overflow checks allows the function to handle almost twice as many input values. In fact, the function is able to handle every input whose result fits within 32 bits.

Matthew T. Staebler
+3  A: 
UInt32 sumSquares(const UInt32 number)
{
  __asm
  {
    mov eax, number     // n
    cmd eax, MAX_VALUE
    jg  bad_value

    lea ebx, [eax+1]    // n + 1
    lea ecx, [2*eax+1]  // 2n + 1

    mul ebx
    mul ecx

    shr eax, 1          // divide by 2
    mov ebx, 2863311531 // inverse of 3
    mul ebx             // divide by 3

    ret

    bad_value:
    xor eax, eax        // return 0
    ret
  }
}

http://blogs.msdn.com/devdev/archive/2005/12/12/502980.aspx

Spara

Sparafusile
The lea tricks are interesting to talk about; I believe they can be slower than the straightforward arithmetic on newer processors. The multiply by inverse is a good technique that I did not have the energy to describe yesterday; I think your result is in edx, though, so you need to move it to eax.
Doug Currie
@Doug Currie I'm not sure what's making you think the result is in EDX. All operations are done on EAX and the high order bits of the multiply are in EDX (which are discarded). Am I missing something?Interesting arithmetic tricks are certainly possible, but you did a good job of that earlier so I opted for a more straight-forward example.
Sparafusile
I guessed that you were multiplying by 1431655765 (2^32/3). Reading closer I see that you are not. The multiply by inverse you are using only works when the dividend is an exact multiple of the divisor. Amazingly this is the case for this algorithm. +1
Doug Currie