ansaurus

Question

x86 MUL Instruction from VS 2008/2010

Answer 1

+1 A:

My intuition tells me that the compiler chose IMUL arbitrarily (or whichever was faster of the two) since the bits will be the same whether it uses unsigned MUL or signed IMUL. Any 32-bit integer multiplication will be 64-bits spanning two registers, EDX:EAX. The overflow goes into EDX which is essentially ignored since we only care about the 32-bit result in EAX. Using IMUL will sign-extend into EDX as necessary but again, we don't care since we're only interested in the 32-bit result.

Jeff M 2010-10-28 03:20:50

Answer 2

+3 A:

According to http://gmplib.org/~tege/x86-timing.pdf, the IMUL instruction has a lower latency and higher throughput (if I'm reading the table correctly). Perhaps VS is simply using the faster instruction (that's assuming that IMUL and MUL always produce the same output).

I don't have Visual Studio handy, so I tried to get something else with GCC. I also always get some variation of IMUL.

This:

unsigned int func(unsigned int a, unsigned int b)
{ 
    return a * b;
}

Assembles to this (with -O2):

_func:
LFB2:
        pushq   %rbp
LCFI0:
        movq    %rsp, %rbp
LCFI1:
        movl    %esi, %eax
        imull   %edi, %eax
        movzbl  %al, %eax
        leave
        ret

Seth 2010-10-28 03:38:34

Answer 3

+7 A:

imul is more powerful because it accepts using somewhat arbitrary operand registers, whereas mul necessarily uses eax as one of the inputs, and writes out the result into edx:eax. imul makes it easier for the compiler.

imul is nominally for signed integer types, but when multiplying two 32-bit values, the least significant 32 bits of the result are the same, whether you consider the values to be signed or unsigned. In other words, the difference between a signed and an unsigned multiply becomes apparent only if you look at the "upper" half of the result, which mul puts in edx and imul puts nowhere. In C, results of arithmetic operations have the same type than the operands (if you multiply two int together, you get an int, not a long long): the "upper half" is not retained. Hence, the C compiler only needs what imul provides, and since imul is easier to use than mul, the C compiler uses imul.

As a second step, since C compilers use imul and not mul, Intel and AMD invest more efforts into optimizing imul than mul, making the former faster in recent processors. This makes imul even more attractive.

mul is useful when implementing big number arithmetics. In C, in 32-bit mode, you should get some mul invocations by multiplying long long values together. But, depending on the compiler and OS, those mul opcodes may be hidden in some dedicated function, so you will not necessarily see them. In 64-bit mode, long long has only 64 bits, not 128, and the compiler will simply use imul.

Thomas Pornin 2010-10-28 07:51:51

Are you certain of the causality of IMUL/MUL optimizations? Is it possible that VS prefers IMUL because it happens to already be faster (vice compilers prefering it, causing Intel/AMD to make it faster)?

Mike S 2010-10-28 14:47:59

@Mike: on the 80386, `mul` and `imul` offer the same speed, and C compilers were already using `imul` because of the convenience of choosing the registers. So I think that compilers chose first, and processor vendors followed, not the other way round.

Thomas Pornin 2010-10-28 17:25:24

ansaurus

tags:

views:

answers:

x86 MUL Instruction from VS 2008/2010

related questions