views:

1049

answers:

12

In my computer this code takes 17 seconds (1000 millons times)

static void Main(string[] args) {
   var sw = new Stopwatch(); sw.Start();
   int r;
   for (int i = 1; i <= 100000000; i++) {
      for (int j = 1; j <= 10; j++) {
         MyDivRem (i,j, out r);
      }
   }
   Console.WriteLine(sw.ElapsedMilliseconds);
}

static int MyDivRem(int dividend, int divisor, out int remainder) {
   int quotient = dividend / divisor;
   remainder = dividend - divisor * quotient;
   return quotient;
}

while Math.DivRem takes 27 seconds. Reflector gives me this code for Math.DivRem:

public static int DivRem(int a, int b, out int result)
{
    result = a % b;
    return (a / b);
}

edit: IL

.method public hidebysig static int32 DivRem(int32 a, int32 b, [out] int32& result) cil managed
{
    .maxstack 8
    L_0000: ldarg.2 
    L_0001: ldarg.0 
    L_0002: ldarg.1 
    L_0003: rem 
    L_0004: stind.i4 
    L_0005: ldarg.0 
    L_0006: ldarg.1 
    L_0007: div 
    L_0008: ret 
}

Theorically it may be faster for computers with multiple cores, but in fact it shouldn't need to do two operations in the first place because x86 CPUs return both the quotient and remainder when they do a integer division using DIV or IDIV ( http://www.arl.wustl.edu/~lockwood/class/cs306/books/artofasm/Chapter_6/CH06-2.html#HEADING2-451 ) !!!

A: 

what happens when you're running .NET on non-x86?

Jimmy
x64 is a superset of x86 and if the CPU is not compatible with x86 is just a matter of using different code for the .net framework for that CPU
ggf31416
Right but .NET is also implemented as Mono and should therefore run on other archs such as ppc etc.
André
+2  A: 

The answer is probably that nobody has thought this a priority - it's good enough. The fact that this has not been fixed with any new version of the .NET Framework is an indicator of how rarely this is used - most likely, nobody has ever complained.

Sander
+1  A: 

Does anyone else get the opposite when testing this?

Math.DivRem = 11.029 sec, 11.780 sec
MyDivRem = 27.330 sec, 27.562 sec
DivRem = 29.689 sec, 30.338 sec

FWIW, I'm running an Intel Core 2 Duo.

The numbers above were with a debug build...

With the release build:

Math.DivRem = 10.314
DivRem = 10.324
MyDivRem = 5.380

Looks like the "rem" IL command is less efficient than the "mul,sub" combo in MyDivRem.

Austin Salonen
64 bits windows?
ggf31416
Nope... XP with SP3
Austin Salonen
I think you got it the wrong way around :)
leppie
Wow, what CPU you have?
leppie
@leppie -- release build data added but the numbers were accurate for the debug build.
Austin Salonen
MyDivRem consistently runs in the low- to mid-5's
Austin Salonen
+4  A: 

Grrr. The only reason for this function to exist is to take advantage of the CPU instruction for this, and they didn't even do it!

Joshua
Wouldn't you have to look at the IL to prove this?
Austin Salonen
No. If they did it that way .NET Reflector would tell you that this function is in the native stubs.
Joshua
A: 

Here's my numbers:

15170 MyDivRem
29579 DivRem (same code as below)
29579 Math.DivRem
30031 inlined

Test slightly changed, added added assignment to return value. Running release build.

Core 2 Duo 2.4

Opinion;

You seemed to have found a nice optimization ;)

leppie
+1  A: 

If I had to take a wild guess, I'd say that whoever implemented Math.DivRem had no idea that x86 processors are capable of doing it in a single instruction, so they wrote it as two operations. That's not necessarily a bad thing if the optimizer works correctly, though it is yet another indicator that low-level knowledge is sadly lacking in most programmers nowadays. I would expect the optimizer to collapse modulus and then divide operations into one instruction, and the people who write optimizers should know these sorts of low-level things...

rmeador
just like the compiler or the JIT should replace Math.Pow(x,3) with x*x*x but that seems be asking too much...
ggf31416
A: 

it's partly in the nature of the beast. there is to the best of my knowledge no general quick way to calculate the remainder of a division. this is going to take a correspondingly large amount of clock-cycles, even with x hundred million transistors.

howlingmadhowie
If you have the result, getting the remainder is a mul-sub which is faster than another div.
Joshua
yeah, but you need to get that value first. i can't see any way of doing this in less than O(ln(n)) time. can you?
howlingmadhowie
+1  A: 

The efficiency may very well depend on the numbers involved. You are testing a TINY fraction of the available problem space, and all front-loaded. You are checking the first 1 million * 10 = 1 billion contiguous input combinations, but the actual problem space is approx 4.2 billion squared, or 1.8e19 combinations.

The performance of general library math operations like this needs to be amortized over the whole problem space. I'd be interested to see the results of a more normalized input distribution.

Chris Ammerman
Plus, I'd say performing 1 billion runs in under 30 seconds is pretty good, so what's the fuss?
Michael Haren
I tested almost the entire space incrementing each variable by a large primes and Math.DivRem is still inefficient
ggf31416
+5  A: 

Wow, that really looks stupid, doesn't it?

The problem is that -- according to the Microsoft Press book ".NET IL Assembler" by Lidin -- the IL rem and div atithmetic instructions are exactly that: compute remainder and compute divisor.

All arithmetical operations except the negation operation take two operands from the stack and put the result on the stack.

Apparently, the way the IL assembler language is designed, it's not possible to have an IL instruction that produces two outputs and pushes them onto the eval stack. Given that limitation, you can't have a division instruction in IL assembler that computes both the way the x86 DIV or IDIV instructions do.

IL was designed for security, verifiability, and stability, NOT for performance. Anyone who has a compute-intensive application and is concerned primarily with performance will be using native code and not .NET.

I recently attended Supercomputing '08, and in one of the technical sessions, an evangelist for Microsoft Compute Server gave the rough rule of thumb that .NET was usually half the speed of native code -- which is exactly the case here!.

Die in Sente
+2  A: 

I accepted Joshua's answer even if it was not the most complete as I agree that the method has no reason to exist if microsoft don't care to make it faster than / and %.
An trivial optimization that adds only 1 line and makes the method 60%-90% faster certainly is not a premature optimization.
I upvoted the other useful answers.

ggf31416
A: 

I would guess that the majority of the added cost is in the set-up and tear-down of the static method call.

As for why it exists, I would guess that it does in part for completeness and in part for the benefit of other languages that may not have easy to use implementations of integer division and modulus computation.

Jim Carnicelli
A: 

It is optimized on .NET 4.0.

On .NET 4.0 it uses NGen.

The results I got with Math.DivRem (debug; release = ~11000ms)

11863
11820
11881
11859
11854

Results I got with MyDivRem (debug; release = ~11000ms)

29177
29214
29472
29277
29196

Project targeted for x86.


Method signatures

DivRem(Int32, Int32, Int32&) : Int32
DivRem(Int64, Int64, Int64&) : Int64

.NET 4.0 Code

[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
public static int DivRem(int a, int b, out int result)
{
    result = a % b;
    return (a / b);
}

.NET 4.0 IL

.custom instance void System.Runtime.TargetedPatchingOptOutAttribute::.ctor(string) = { string('Performance critical to inline across NGen image boundaries') }
.maxstack 8
L_0000: ldarg.2 
L_0001: ldarg.0 
L_0002: ldarg.1 
L_0003: rem 
L_0004: stind.i4 
L_0005: ldarg.0 
L_0006: ldarg.1 
L_0007: div 
L_0008: ret 

MSDN Reference

BrunoLM