views:

1656

answers:

6

Possible Duplicate:
How can I multiply two 64bit numbers using x86 assembly language?

How do you multiply two 64-bit numbers in x86 assembler?

This question is important for historical reasons.

Presumably, Joel meant 386 assembler.

Related question

How do you multiply two 64bit numbers in assembly

+2  A: 

You can split it in two 32-bit numbers A and B, and, given that

(A0*X + B0) * (A1 * X + B1)

equals

A0 * A1 * XX + (A0 * B1 + B0 * A1 ) * X + B0 * B1

(where X is 2^32)

You would have to account for overflow on every part-product:

// pure math:
A = A0*A1
B = A0*B1 + B0*A1
C = B0*B1

// truncating to 32 bit, and keeping the carries
let Co = C >> 32
let Bb = B + Co
let Bo = Bb >> 32
let Aa = A + Bo

// adding all up
Most significant word   = Aa & 0xFFFF
Second significant word = Bb & 0xFFFF
Third significant word  = C  & 0xFFFF

Do the math like that.

xtofl
I don't think this is a complete answer. What is X? What happens when the answer is 65 bits?
You are right. That's because basically I don't program in assembler - X would be 2^32; I'll add that. Overflow can always happen, and must indeed be handled _a lot_ better.
xtofl
+1 Thanks for the writeup, it's informative how to tackle such things.
Marco van de Voort
A: 

Looks like there's already a thread on this:

http://stackoverflow.com/questions/87771/how-do-you-multiply-two-64bit-numbers-in-assembly

Donnie DeBoer
A: 

if you consider x86_64 to be an x86 platform, gcc suggests this:

movq    a(%rip), %rdx
movq    b(%rip), %rax
imulq   %rdx, %rax
movq    %rax, c(%rip)

Here, a, b and c are all declared static unsigned long long.

unwind
I'm pretty sure that x86_64 was not what Joel meant by x86 when he asked that question. :)
Yeah, me too, but this is still a pretty neat answer, since it (imo) illustrates that his random off-the-wall problem can be pretty much solved in one instruction, without (again, imo) stretching it too far.
unwind
A: 

I think that this would work, but it has been done by hand and not checked in any way...

val1low     dd 12345678h
val1high    dd 9ABCDEF0h
val2low     dd 12345678h
val2high    dd 9ABCDEF0h

ans         dd ?, ?, 0, 0

    mov eax, [val1low]
    mov edx, [val2low]
    mul edx
    mov ans[0], eax
    mov ans[1], edx

    mov eax, [val1high]
    mov edx, [val2low]
    mul edx
    add ans[1], eax
    adc ans[2], edx
    adc ans[3], 0

    mov eax, [val1low]
    mov edx, [val2high]
    mul edx
    add ans[1], eax
    adc ans[2], edx
    adc ans[3], 0

    mov eax, [val1high]
    mov edx, [val2high]
    mul edx
    add ans[2], eax
    adc ans[3], edx
Julian
The goal of this code appears to be 64x64 --> 128 bits.But, it stores into ans[1] twice. I think it is incorrectas it appears to lose the upper half of the 64 bitpartial product of val1ow and val2low. Perhaps that second store should be adc? I also think you lose the carry out of the adc ans[2].
Ira Baxter
I told you it was done entirely by hand, and not checked... I have updated it now.
Julian
+1  A: 

One option you always have when presented with questions like this is to make use of your compiler's ability to produce source-annotated assembly.

For GCC:

gcc -Wa,-aldhs -g [file]

will produce human readable assembly(well, as human readable as any assembly can be...) with source code interspersed. Here's a short program multiplying 2 longs on a 32 bit architecture.

   4:tmp.c         **** void main(void) {
  16 0000 8D4C2404   leal 4(%esp), %ecx
  18 0004 83E4F0     andl $-16, %esp
  19 0007 FF71FC     pushl -4(%ecx)
  21 000a 55         pushl %ebp
  23 000b 89E5       movl %esp, %ebp
  25 000d 51         pushl %ecx
  27 000e 83EC24     subl $36, %esp
   5:tmp.c         ****   long long a = 4000000;
  30 0011 C745E000   movl $4000000, -32(%ebp)
  31 0018 C745E400   movl $0, -28(%ebp)
   6:tmp.c         ****   long long b = 4000000;
  33 001f C745E800   movl $4000000, -24(%ebp)
  34 0026 C745EC00   movl $0, -20(%ebp)
   7:tmp.c         ****   long long c = a*b;
  36 002d 8B45E4     movl -28(%ebp), %eax
  37 0030 89C1       movl %eax, %ecx
  38 0032 0FAF4DE8   imull -24(%ebp), %ecx
  39 0036 8B45EC     movl -20(%ebp), %eax
  40 0039 0FAF45E0   imull -32(%ebp), %eax
  41 003d 01C1       addl %eax, %ecx
  42 003f 8B45E8     movl -24(%ebp), %eax
  43 0042 F765E0     mull -32(%ebp)
  44 0045 01D1       addl %edx, %ecx
  45 0047 89CA       movl %ecx, %edx
  46 0049 8945F0     movl %eax, -16(%ebp)
  47 004c 8955F4     movl %edx, -12(%ebp)
  48 004f 8945F0     movl %eax, -16(%ebp)
  49 0052 8955F4     movl %edx, -12(%ebp)
   8:tmp.c         **** }
Falaina
How can this code be right? I see it computing 3 partial products, and I see a partial-sum of the partial products (addl), but given there's a 64 bit result, something needs to propagate a carry bit between the 32 bit halves. And there's no add carry instruction anywhere in sight. What am I missing?
Ira Baxter
Hmm. Answered my own question. There isn't a carry out of the bottom 32 bits of the lower partial product.
Ira Baxter
A: 

Faster code, using the ideas that you probably only want a 64 bit product, that you don't care about overflow or signed numbers. (A signed version is similar).

(This is untested, but I think its right)


MUL64_MEMORY:
     mov edi, val1high
     mov esi, val1low
     mov ecx, val2high
     mov ebx, val2low
MUL64_EDIESI_ECXEBX:
     mov eax, edi
     mul ebx
     xch eax, ebx  ; partial product top 32 bits
     mul esi
     xch esi, eax ; partial product lower 32 bits
     add ebx, edx
     mul ecx
     add ebx, eax  ; final upper 32 bits
; answer here in EBXESI

(Edit: first draft had useless adc instructions. Result looks much like code emitted from GCC compiler [See other answer] except this fits entirely in registers)

Ira Baxter