ansaurus

Question

Answer 1

+3 A:

Recompile the C++ program with full optimizations turned on and rerun the tests. The C# jit will optimize the code when its jitted so you compared optimized C#/.NET code to unoptimized C++.

Brian Ensink 2010-03-23 00:49:38

Answer 2

+10 A:

You need to compile C++ in release mode and enable optimiziations to get the performance results you are looking for.

Axel Gneiting 2010-03-23 00:49:58

I will try that too.

Mack 2010-03-23 01:12:00

Answer 3

+11 A:

the prime generator in C++ is not correct

i^(1/2) == i xor 0

^ is the bitwise xor operator and / is integer division.

1st edit, it's correct but ineffecient: Since i xor 0 == i, the sieve doesn't stop at sqrt(i) but at i.

2nd edit:

The sieving can be done a little bit more efficient. (You only need to compute sqrt(n)). This is how I implemented the Sieve of Eratosthenes for my own use (this is in C99 though):

void sieve(const int n, unsigned char* primes)
{
        memset(primes, 1, (n+1) * sizeof(unsigned char));

        // sieve of eratosthenes
        primes[0] = primes[1] = 0;
        int m = floor(sqrt(n));
        for (int i = 2; i <= m; i++)
                if (primes[i]) // no need to remove multiples of i if it is not prime
                        for (int j = i; j <= (n/i); j++)
                                primes[i*j] = 0;
}

sisis 2010-03-23 00:50:06

I corrected it. Thanks.

Mack 2010-03-23 01:11:21

I wonder, though, whether or not this actually makes it run slower, because the C# version has to actually evaluate `Math.Sqrt(i)` on every iteration, which is many many times more expensive than the single-instruction XOR, so this little mistake might actually make the C++ version *faster* because both versions of the program are inefficient.

Aaronaught 2010-03-23 01:11:52

@Aaronaught: Except the C# version is looping to Sqrt(i) and the c++ version is looping to i.

Billy ONeal 2010-03-23 01:12:50

@BillyONeal: Completely true, for large values of `n` the C++ version will always be slower because it loops more. What I meant was that the benchmark has been made sensitive to `n` and it shouldn't be.

Aaronaught 2010-03-23 01:22:44

@Aaronaught: I doubt C# will recompute it. Nothing in the loop will change it. Contrarily, it should be cached in C++ explicitly.

GMan 2010-03-23 01:27:33

@Aaronaught: True, but I think the example data size (1,000,000) qualifies as "large N" in this case :)

Billy ONeal 2010-03-23 01:27:59

@GMan: *You* may know that nothing in the loop will change it, but the C# compiler cannot make that assumption. It *will* recompute it on every iteration. If you don't believe me, try it out.

Aaronaught 2010-03-23 01:52:35

@Aaronaught: I find that surprising. At least in C++, if the definition of `sqrt` were visible it would never recalculate it.

GMan 2010-03-23 02:03:07

@GMan: That is why I brought the point up; any reliance on specific compiler optimizations may skew the results. It's important to have a fair test.

Aaronaught 2010-03-23 02:06:13

Answer 4

+2 A:

First, never do such benchmarks in debug mode. To get meaningful numbers always use release mode.

The JIT has the advantage of knowing the platform it runs on, while precompiled code may be suboptimal for the platform it is running on.

Lucero 2010-03-23 00:50:30

Maybe, but there is of course the overhead of actually doing the compilation on that platform.

Billy ONeal 2010-03-23 00:52:50

Answer 5

+6 A:

Why would you assume that jitted code is slower than native code? The only speed penalty would be the actual jitting, which only happens once (generally speaking). Given a program with a 30-second running time, we are talking about a minuscule portion of the total cost.

I think you may be confusing jitted code with interpreted code, which is compiled line-by-line. There's a pretty significant difference between the two.

As others have pointed out, you also need to run this in release mode; debug mode turns off most optimizations, so both versions will be slower than they should be (but by different amounts).

Edit - I should point out one other thing, which is that this line:

for (j = 2; j <= Math.Sqrt(i); j++)

Is incredibly inefficient and may interfere with the benchmarking. You should be calculating Math.Sqrt(i) outside of the inner loop. It's possible that this will slow down both versions by an equivalent amount, but I'm not sure, different compilers will perform different optimizations.

Aaronaught 2010-03-23 00:52:21

+1. C++ suffers more from the optimizations turned off because it relies on inlining to be performant. Also, the JIT is going to perform some optimizations on C# even if the binary was built in debug mode.

Billy ONeal 2010-03-23 00:54:45

Not only does it turn off most optimizations, it may add a bunch of extraneous error checking. Visual Studio, for example, adds overflow checking, heap checking, and various other exciting things in the default debug mode configuration.

kibibu 2010-03-23 00:56:08

Answer 6

+6 A:

It's taking so much longer because the algorithm is wrong.

for(j = 2; j < (i^(1/2)); j++)

is the same as

for(j = 2; j < (i^0); j++)

is the same as

for(j = 2; j < i; j++)

i is a lot bigger than sqrt(i). Looking at just running time, it's an order of magnitude larger that it should be in the C++ implementation.

Also, like everybody else is saying, I don't think it makes sense to do performance testing in debug mode.

Chris 2010-03-23 00:57:55

I've corrected the mistake, I'm sorry. I'll try to run them in release mode too.

Mack 2010-03-23 01:12:39

Answer 7

A:

Both tests are invalid because you've compiled without optimizations.

The first test is meaningless even as a comparrison of unoptimized behaviour, because of an error in your code; Math.Sqrt(i) returns the square root of i, i^(1/2) returns i - so the C++ is doing much more work than the C#.

More generally, this isn't a useful thing to do - you're trying to create a synthetic benchmark that has little to no relevence to real world usage.

Joe Gauterin 2010-03-23 01:01:21

Answer 8

+2 A:

It is a persistent myth that the JIT compiler in managed code generates machine code that is a lot less efficient than the one generated by a C/C++ compiler. Managed code usually wins on memory management and floating point math, C/C++ usually wins when the code optimizer can spend a lot more time optimizing code. In general, managed code is about 80% but it completely depends on the 10% of the code where a program spends 90% of its time.

Your test won't show this, you didn't enable the optimizer and there isn't much to optimize.

Hans Passant 2010-03-23 01:01:35

"Managed code usually wins on memory management" <-- Because C/C++ programmers are typically lazy and just use giant buffers instead of something like a `std::vector<t>` which is common in JIT'd languages. The speeds of both the compiled code and the JIT'd code should be about the same, all said and done, so long as you discount the time it takes to load the jitter into memory (which can be quite large in Java's case :P ) +1

Billy ONeal 2010-03-23 01:06:16

The managed code can hardly win on floating point operations because the current .Net JIT doesn't emit any SSE instructions.

Jasper Bekkers 2010-03-23 01:47:57

There's more than one, the x64 one does. Example: http://stackoverflow.com/questions/686483/c-vs-c-big-performance-difference/687741#687741

Hans Passant 2010-03-23 02:11:58

It seems that x64 JIT is a totally different when it comes to conception: " The 32-bit JIT and the 64-bit JIT were implemented by two different teams within Microsoft using two different code bases. The 32-bit JIT was developed by the CLR team, whereas the 64-bit JIT was developed by the Visual C++ team and is based on the Visual C++ code base. Because the 64-bit JIT was developed by the C++ team, it is more aware of C++-related issues."

Mack 2010-03-23 02:43:49

Yes, the x86 JITter dates from ~1998. Was SSE2 around then? x64 dates from ~2003 and is indeed done by a different team. Not the C++ team. It never generates code from a C++ program, it generates code from IL.

Hans Passant 2010-03-23 03:01:02

Answer 9

+1 A:

guys, before comparing program speed one to another please bother to read several articles on cpu instructions, assembly, cache management etc. And the author is just a ridiculously funny buddy. Checking the performance of a debug builds.

Billy O'Neal - what is the difference between allocating a big buffer and using only small part of it and using dynamically allocated thing like vector in low language words ? Once big buffer been allocated - nobody bothers about the unused stuff. No further support operations needed. While for dynamic things like vector - constant checking of memory bounds required no to outrun of it. Remember, C++ programmers are not just lazy (which is quire true I admit), but they're also smart.

Alexander Solonsky 2010-03-23 01:56:47

Answer 10

A:

How about this:

for(sqrti = 1; sqrti <= 11112; sqrti++) {
  int nexti = (1+sqrti)*(1+sqrti)
  for (i = sqrti*sqrti; i < nexti; i++)
  {
    int isprime = 1;
    for(j = 2; j < sqrti; j++) 
        if(!(i%j)) {
      isprime = 0;
      break;
    }
  }

} countprimes+= isprime; }

Chris 2010-03-23 01:57:50

ansaurus

tags:

views:

answers:

c++ and c# speed compared

related questions