views:

241

answers:

1

In Delphi, the declaration of the DivMod function is

procedure DivMod(Dividend: Cardinal; Divisor: Word;
  var Result, Remainder: Word);

Thus, the divisor, result, and remainder cannot be grater than 65535, a rather severe limitation. Why is this? Why couldn't the delcaration be

procedure DivMod(Dividend: Cardinal; Divisor: Cardinal;
  var Result, Remainder: Cardinal);

The procedure is implemented using assembly, and is therefore probably extremely fast. Would it not be possible for the code

    PUSH    EBX
    MOV     EBX,EDX
    MOV     EDX,EAX
    SHR     EDX,16
    DIV     BX
    MOV     EBX,Remainder
    MOV     [ECX],AX
    MOV     [EBX],DX
    POP     EBX

to be adapted to cardinals? How much slower is the naïve attempt

procedure DivModInt(const Dividend: integer; const Divisor: integer; out result: integer; out remainder: integer);
begin
  result := Dividend div Divisor;
  remainder := Dividend mod Divisor;
end;

that is not (?) limited to 16-bit integers?

+9  A: 

Such a procedure is possible. I have not tested the code enough, but I think it's OK:

procedure DivMod32(Dividend, Divisor: Cardinal; var Quotient, Remainder: Cardinal);
asm
        PUSH EBX
        MOV  EBX,EDX
        XOR  EDX,EDX
        DIV  EBX
        MOV  [ECX],EAX
        MOV  EBX,Remainder
        MOV  [EBX],EDX
        POP  EBX
end;

Updated:

even more efficient:

function DivMod32(Dividend, Divisor: Cardinal; var Remainder: Cardinal): Cardinal;
asm
        PUSH EBX
        MOV  EBX,EDX
        XOR  EDX,EDX
        DIV  EBX
        MOV  [ECX],EDX
        POP  EBX
end;

Updated 2:

You can see the assembly code generated by Delphi compiler in the Disassembly (or CPU) window. Eg, the procedure

procedure DivMod32(const Dividend: Cardinal; const Divisor: Cardinal;
                    out result: Cardinal; out remainder: Cardinal);
begin
  result := Dividend div Divisor;
  remainder := Dividend mod Divisor;
end;

generates code

Unit1.pas.28: begin
0046CC94 55               push ebp
0046CC95 8BEC             mov ebp,esp
0046CC97 53               push ebx
0046CC98 56               push esi
0046CC99 8BF2             mov esi,edx
0046CC9B 8BD8             mov ebx,eax
Unit1.pas.29: result := Dividend div Divisor;
0046CC9D 8BC3             mov eax,ebx
0046CC9F 33D2             xor edx,edx
0046CCA1 F7F6             div esi
0046CCA3 8901             mov [ecx],eax
Unit1.pas.30: remainder := Dividend mod Divisor;
0046CCA5 8BC3             mov eax,ebx
0046CCA7 33D2             xor edx,edx
0046CCA9 F7F6             div esi
0046CCAB 8B4508           mov eax,[ebp+$08]
0046CCAE 8910             mov [eax],edx
Unit1.pas.31: end;
0046CCB0 5E               pop esi
0046CCB1 5B               pop ebx
0046CCB2 5D               pop ebp
0046CCB3 C20400           ret $0004

This code is linear (contains no jumps) and modern processors (with long instruction pipeline) are very efficient in executing linear code. So though my DivMode32 implementation is about 3 times shorter, 60% is a reasonable estimate.

Serg
Thank you very much. Your ASM code takes only approx. 60 % of the time compared to the naive approach using the div and mod operators (at least on my i7 system). (Isn't that strange? Should not the Delphi compiler create efficient code?)Why do you think that the RTL only offers a 16 bit version of DivMod?
Andreas Rejbrand
There's only so much a compiler can be expected to do. Serg is (ab)using the fact that both of those lines require the same division, and that he can get both the quotient and the remainder at the same time. Also note that a fair comparison requires you to compare with the first one, not the second one, due to the difference in method signatures. That's 8 instructions in the hand-written ASM code, versus the 19 Delphi creates. Remove the 3-4 instructions for the extra division, and you're down to 8 vs. 15 instructions.
Michael Madsen
What you have left mostly differs due to different setup/teardown code. In D2010, 4 of those instructions are actually added to his code anyway (EBP manipulation and the RET). Now we only have a 3 instruction advantage if you remove the extra division from the generated code - and, as far as I can tell, the difference comes from Serg choosing his registers a bit more cleverly (2 of which are cut by Serg not touching ESI at all). He can do that because he knows the intent of your code better than the compiler - it probably does things in a safer way, because it doesn't know what your intent is.
Michael Madsen
I did compare it to the procedure.
Andreas Rejbrand
Thank you all for your very good comments!
Andreas Rejbrand
@Michael: Using both register results after a `DIV` or `IDIV` is something any optimizing compiler certainly is *expected* to do. Same for good allocation of registers. The Delphi compiler is simply lacking, on both accounts.
mghie