ansaurus

Question

Answer 1

+5 A:

TickCount is not a reliable timer; you should use .Net's Stopwatch class. (I don't know what the Delphi equivalent is).

Also, are you running a Release build?
Do you have a debugger attached?

SLaks 2010-04-18 15:28:14

Well, while it's true that GetTickCount does not have a millisecond resolution, it sure can handle these time spans: a half second and almost four seconds. So that should not be the issue.

Andreas Rejbrand 2010-04-18 15:31:56

@Andreas: It can depend on system load and other things.

SLaks 2010-04-18 15:32:34

Eric Lippert even has a blog about using StopWatch for accuracy http://blogs.msdn.com/ericlippert/archive/2010/04/08/precision-and-accuracy-of-datetime.aspx

juharr 2010-04-18 15:33:35

Just tested it. Release build and no debugger yields ca 500 ms on my machine. Switching to debug build or running with debugger (or both) yields ca 4000 ms. The difference between Stopwatch and TickCount is insignificant.

dtb 2010-04-18 15:34:53

Yes, GetTickCount is about to 15ms accurate on most systems. In this case this does not matter.

Runner 2010-04-18 15:52:47

If we had a nickel for every time somebody "profiles" a .NET app in debug mode and posts a question about its performance on Stack Overflow...

Aaronaught 2010-04-18 18:25:14

Answer 2

+25 A:

Delphi is compiled to native code, whereas C# is compiled to CLR code which is then translated at runtime. That said C# does use JIT compilation, so you might expect the timing to be more similar, but it is not a given.

It would be useful if you could describe the hardware you ran this on (CPU, clock rate).

I do not have access to Delphi to repeat your experiment, but using native C++ vs C# and the following code:

VC++ 2008

#include <iostream>
#include <windows.h>

int main(void)
{
    int tick = GetTickCount() ;
    for (int i = 0; i < 1000000000; ++i)
    {
    }
    tick = GetTickCount() - tick;
    std::cout << tick << " ms" << std::endl  ; 
}

C#

using System;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            int tick = System.Environment.TickCount;
            for (int i = 0; i < 1000000000; ++i)
            {
            }
            tick = System.Environment.TickCount - tick;
            Console.Write( tick.ToString() + " ms" ) ; 
        }
    }
}

I initially got:

C++  2792ms
C#   2980ms

However I then performed a Rebuild on the C# version and ran the executable in <project>\bin\release and <project>\bin\debug respectively directly from the command line. This yielded:

C# (release):  720ms
C# (debug):    3105ms

So I reckon that is where the difference truly lies, you were running the debug version of the C# code from the IDE.

In case you are thinking that C++ is then particularly slow, I ran that as an optimised release build and got:

C++ (Optimised): 0ms

This is not surprising because the loop is empty, and the control variable is not used outside the loop so the optimiser removes it altogether. To avoid that I declared i as a volatile with the following result:

C++ (volatile i): 2932ms

My guess is that the C# implementation also removed the loop and that the 720ms is from something else; this may explain most of the difference between the timings in the first test.

What Delphi is doing I cannot tell, you might look at the generated assembly code to see.

All the above tests on AMD Athlon Dual Core 5000B 2.60GHz, on Windows 7 32bit.

Clifford 2010-04-18 15:32:53

my assembly code runs in 0.390, i think the c# loop is getting executed.

Behrooz 2010-04-18 16:32:59

@Behrooz: Quite possibly, I chose not to analyse it further, and did not really believe my own hypothesis! The point was to demonstrate that great care needs to be taken when making such comparisons, and that it is possible to get comparable timings from C#. Also as a benchmark, it sucks. Knowing how long it takes to do something pointless, does not tell you much! Real work generally requires at least memory access, and this loop can be performed entirely at the register level inside the CPU.

Clifford 2010-04-18 16:46:41

@Clifford: yes.

Behrooz 2010-04-18 17:06:10

Making i volatile is going to make the compiler regrab i from memory every time it wants to increment or check its value. This is going to be incredibly slow ...

Goz 2010-04-20 09:35:03

@Goz: Yes, that occurred to me, but the loop either does something or nothing; and without volatile it is equivalent to nothing, and is therefore optimised to nothing. One solution is to move the scope of i out of the loop and access it after the loop; but I suspect the optimiser will just then reduce it to an assignment. It is not worth trying to fix a pointless test however.

Clifford 2010-04-20 18:07:44

But it doesn't do nothing. incrementing and comparing against a single register 1,000,000,000 times is VERY different to grabbing the value from cache incrementing it, writing it back to the cache, grabbing it from the cache and comparing it. One is 2 ops on a register ... the other is 2 ops on a register, 2 reads and 1 write. The 2 reads and 1 write will be significantly slower than both the increment AND the compare.

Goz 2010-04-20 18:24:38

Answer 3

+1 A:

You should attach a debugger and take a look at the machine code generated by each.

C. Dragon 76 2010-04-18 15:36:14

Answer 4

+2 A:

this is the c# disassembly:
DEBUG:

// int i = 0; while (++i != 1000000000) ;//==for(int i ...blah blah blah)
0000004e 33 D2            xor         edx,edx 
00000050 89 55 B8         mov         dword ptr [ebp-48h],edx 
00000053 90               nop              
00000054 EB 00            jmp         00000056 
00000056 FF 45 B8         inc         dword ptr [ebp-48h] 
00000059 81 7D B8 00 CA 9A 3B cmp         dword ptr [ebp-48h],3B9ACA00h 
00000060 0F 95 C0         setne       al   
00000063 0F B6 C0         movzx       eax,al 
00000066 89 45 B4         mov         dword ptr [ebp-4Ch],eax 
00000069 83 7D B4 00      cmp         dword ptr [ebp-4Ch],0 
0000006d 75 E7            jne         00000056

as you see it is a waste of cpu.
EDIT:
RELEASE:

   //unchecked
   //{
   //int i = 0; while (++i != 1000000000) ;//==for(int i ...blah blah blah)
00000032 33 D2            xor         edx,edx 
00000034 89 55 F4         mov         dword ptr [ebp-0Ch],edx 
00000037 FF 45 F4         inc         dword ptr [ebp-0Ch] 
0000003a 81 7D F4 00 CA 9A 3B cmp         dword ptr [ebp-0Ch],3B9ACA00h 
00000041 75 F4            jne         00000037 
   //}

EDIT:
and this is the c++ version:running about 9x faster in my machine.

    __asm
    {
        PUSH ECX
        PUSH EBX
        XOR  ECX, ECX
        MOV  EBX, 1000000000
NEXT:   INC  ECX
        CMP  ECX, EBX
        JS   NEXT
        POP  EBX
        POP  ECX
    }

Behrooz 2010-04-18 15:40:23

Is this from a Release build?

SLaks 2010-04-18 15:45:25

@SLaks:no, it is in debug mode.

Behrooz 2010-04-18 15:51:54

Debug build can add a lot of overhead.

ewanm89 2010-04-18 15:59:59

@ewanm89:yes, the release version runs in 2.77 while in debug it runs in 3.14(OMG it is Pi).

Behrooz 2010-04-18 16:24:41

Answer 5

+6 A:

If this is intended as a benchmark, it's an exceptional bad one as in both cases the loop can be optimized away, so you have to look at the generated machine code to see what's going on. If you use release mode for C#, the following code

 Stopwatch sw = Stopwatch.StartNew();
 for (int i = 0; i < 1000000000; ++i){ }
 sw.Stop();
 Console.WriteLine(sw.Elapsed);

is transformed by the JITter to this:

 push        ebp 
 mov         ebp,esp 
 push        edi 
 push        esi 
 call        67CDBBB0 
 mov         edi,eax 
 xor         eax,eax               ; i = 0
 inc         eax                   ; ++i
 cmp         eax,3B9ACA00h         ; i == 1000000000?
 jl          0000000E              ; false: jmp
 mov         ecx,edi 
 cmp         dword ptr [ecx],ecx 
 call        67CDBC10 
 mov         ecx,66DDAEDCh 
 call        FFE8FBE0 
 mov         esi,eax 
 mov         ecx,edi 
 call        67CD75A8 
 mov         ecx,eax 
 lea         eax,[esi+4] 
 mov         dword ptr [eax],ecx 
 mov         dword ptr [eax+4],edx 
 call        66A94C90 
 mov         ecx,eax 
 mov         edx,esi 
 mov         eax,dword ptr [ecx] 
 mov         eax,dword ptr [eax+3Ch] 
 call        dword ptr [eax+14h] 
 pop         esi 
 pop         edi 
 pop         ebp 
 ret

Novox 2010-04-18 15:52:49

Answer 6

+4 A:

The Delphi compiler uses the for loop counter downwards (if possible); the above code sample is compiled to:

Unit1.pas. 42: Tick := GetTickCount();
00489367 E8B802F8FF       call GetTickCount
0048936C 8BF0             mov esi,eax
Unit1.pas.43: for I := 0 to 1000000000 do
0048936E B801CA9A3B       mov eax,$3b9aca01
00489373 48               dec eax
00489374 75FD             jnz $00489373

Serg 2010-04-18 17:12:53

Answer 7

+2 A:

You are comparing native code against VM JITted code, and that is not fair. Native code will be ALWAYS faster since the JITter can not optimize the code like a native compiler can.

That said, comparing Delphi against C# is not fair at all, a Delphi binary will win always (faster, smaller, without any kind of dependencies, etc).

Btw, I'm sadly amazed how many posters here don't know this differences... or may be you just hurted some .NET zealots that try to defend C# against anything that shows there are better options out there.

sharper 2010-04-18 19:34:09

Yes if I ever have to code an empty loop, I'll keep Delphi in mind as an option.But seriously, the JITter can (at least in theory) do better optimization than a native compiler. The compiler only has access to compile-time (static) information to do its optimization, but the JITter also has runtime (dynamic) information, to optimize according to the real way the code is used.The real advantage is that native code execution time can be *predicted more accurately* than JITted code, which makes it more suited to realtime systems.

ckarras 2010-04-19 01:03:32

-1: see ckarras - and "Always" is a pretty strong word (especially when in blod face)...

RaphaelSP 2010-04-19 17:32:28

"Native code will be ALWAYS faster": I don't think so. For instance, Sun’s Hot Spot compiler can even inline *virtual method calls*. In fact, jitted code can potentially be better optimized, because of the additional runtime information.

Novox 2010-04-24 17:29:53

Answer 8

A:

"// int i = 0; while (++i != 1000000000) ;"

That's interesting.

while (++i != x) is not the same as for (; i != x; i++)

The difference is that the while loop doesn't execute the loop for i = 0.

(try it out: run something like this:


int i;

for (i = 0; i < 5; i++)
    Console.WriteLine(i);

i = 0;
while (++i != 5)
    Console.WriteLine(i);

hehewaffles 2010-04-19 00:38:06

I always thought ++i is not i++

Behrooz 2010-07-13 15:58:58

Answer 9

A:

Delphi would almost definitely optimise that loop to execute in reverse order (ie DOWNTO zero rather than FROM zero) - Delphi does this whenever it determines it is "safe" to do, presumably because either subtraction or checking against zero is faster than addition or checking against a non-zero number.

What happens if you try both cases specifying the loops to execute in reverse order?

Graza 2010-04-19 17:14:05

ansaurus

tags:

views:

answers:

Why looping in Delphi faster than C#?

related questions