views:

536

answers:

10

I was worried about C#'s speed when it deals with heavy calculations, when you need to use raw CPU power.

I always thought that C++ is much faster than C# when it comes to calculations. So I did some quick tests. The first test computes prime numbers < an integer n, the second test computes some pandigital numbers. The idea for second test comes from here: Pandigital Numbers

C# prime computation:

using System;
using System.Diagnostics;

class Program
{

    static int primes(int n)
    {

        uint i, j;
        int countprimes = 0;

        for (i = 1; i <= n; i++)
        {
            bool isprime = true;

            for (j = 2; j <= Math.Sqrt(i); j++)

                if ((i % j) == 0)
                {
                    isprime = false;
                    break;
                }

            if (isprime) countprimes++;
        }

        return countprimes;
    }



    static void Main(string[] args)
    {
        int n = int.Parse(Console.ReadLine());
        Stopwatch sw = new Stopwatch();

        sw.Start();
        int res = primes(n);
        sw.Stop();
        Console.WriteLine("I found {0} prime numbers between 0 and {1} in {2} msecs.", res, n, sw.ElapsedMilliseconds);
        Console.ReadKey();
    }
}

C++ variant:

#include <iostream>
#include <ctime>
#include <cmath>

int primes(unsigned long n) {
unsigned long i, j;
int countprimes = 0;
  for(i = 1; i <= n; i++) {
      int isprime = 1;
      for(j = 2; j < sqrt((float)i); j++) 
          if(!(i%j)) {
        isprime = 0;
        break;
   }
    countprimes+= isprime;
  }
  return countprimes;
}

int main() {
 int n, res;
 cin>>n;
 unsigned int start = clock();

 res = primes(n);
 int tprime = clock() - start;
 cout<<"\nI found "<<res<<" prime numbers between 1 and "<<n<<" in "<<tprime<<" msecs.";
 return 0;
}

When I ran the test trying to find primes < than 100,000, C# variant finished in 0.409 seconds and C++ variant in 0.614 seconds. When I ran them for 1,000,000 C# finished in 6.039 seconds and C++ in about 12.987 seconds.

Pandigital test in C#:

using System;
using System.Diagnostics;

class Program
{
    static bool IsPandigital(int n)
    {
        int digits = 0; int count = 0; int tmp;

        for (; n > 0; n /= 10, ++count)
        {
            if ((tmp = digits) == (digits |= 1 << (n - ((n / 10) * 10) - 1)))
                return false;
        }

        return digits == (1 << count) - 1;
    }

    static void Main()
    {
        int pans = 0;
        Stopwatch sw = new Stopwatch();
        sw.Start();

        for (int i = 1; i <= 123456789; i++)
        {
            if (IsPandigital(i))
            {
                pans++;
            }
        }
        sw.Stop();
        Console.WriteLine("{0}pcs, {1}ms", pans, sw.ElapsedMilliseconds);
        Console.ReadKey();
    }
}

Pandigital test in C++:

#include <iostream>
#include <ctime>

using namespace std;

int IsPandigital(int n)
    {
        int digits = 0; int count = 0; int tmp;

        for (; n > 0; n /= 10, ++count)
        {
            if ((tmp = digits) == (digits |= 1 << (n - ((n / 10) * 10) - 1)))
                return 0;
        }

        return digits == (1 << count) - 1;
    }


int main() {
   int pans = 0;
   unsigned int start = clock();

   for (int i = 1; i <= 123456789; i++)
   {
      if (IsPandigital(i))
      {
        pans++;
      }
   }
   int ptime = clock() - start;
   cout<<"\nPans:"<<pans<<" time:"<<ptime;  
   return 0;
}

C# variant runs in 29.906 seconds and C++ in about 36.298 seconds.

I didn't touch any compiler switches and both C# and C++ programs were compiled with debug options. Before I attempted to run the test I was worried that C# will lag well behind C++, but now it seems that there is a pretty big speed difference in C# favor.

Can anybody explain this? C# is jitted and C++ is compiled native so it's normal that a C++ will be faster than a C# variant.

Thanks for the answers!

I've redid all tests for the Release configuration.

First test (prime numbers)

C# (numbers < 100,0000): 0.189 seconds C++ (numbers < 100,0000): 0.036 seconds

C# (nummbers < 1,000,000): 5.300 seconds C++ (nummbers < 1,000,000): 1.166 seconds

Second test (pandigital numbers):

C#: 21.224 seconds C++: 4.104 seconds

So, everytthing have changed, now C++ is much faster. My mistake is that I've ran the test for Debug configuration. Can I see some speed improvement if I ran the C# executables through ngen?

The reason I tried to compare C# and C++ is because I know some basics of both and I wanted to learn an API dealing with GUI. I was thinking that WPF is nice so given that I'm targeting the desktop, I wanted to see if C# can deliver enough speed and performance when it comes to use sheer CPU power to compute various calculations (file archivers, cryptography, codecs etc). But it seems that sadly C# can't keep pace with C++ when it comes to speed.

So, I'm assuming I will be forever stuck with this question Tough question on WPF, Win32, MFC, and I'll newer find an suitable API.

+3  A: 

Recompile the C++ program with full optimizations turned on and rerun the tests. The C# jit will optimize the code when its jitted so you compared optimized C#/.NET code to unoptimized C++.

Brian Ensink
+10  A: 

You need to compile C++ in release mode and enable optimiziations to get the performance results you are looking for.

Axel Gneiting
I will try that too.
Mack
+11  A: 

the prime generator in C++ is not correct

i^(1/2) == i xor 0

^ is the bitwise xor operator and / is integer division.

1st edit, it's correct but ineffecient: Since i xor 0 == i, the sieve doesn't stop at sqrt(i) but at i.

2nd edit:

The sieving can be done a little bit more efficient. (You only need to compute sqrt(n)). This is how I implemented the Sieve of Eratosthenes for my own use (this is in C99 though):

void sieve(const int n, unsigned char* primes)
{
        memset(primes, 1, (n+1) * sizeof(unsigned char));

        // sieve of eratosthenes
        primes[0] = primes[1] = 0;
        int m = floor(sqrt(n));
        for (int i = 2; i <= m; i++)
                if (primes[i]) // no need to remove multiples of i if it is not prime
                        for (int j = i; j <= (n/i); j++)
                                primes[i*j] = 0;
}
sisis
I corrected it. Thanks.
Mack
I wonder, though, whether or not this actually makes it run slower, because the C# version has to actually evaluate `Math.Sqrt(i)` on every iteration, which is many many times more expensive than the single-instruction XOR, so this little mistake might actually make the C++ version *faster* because both versions of the program are inefficient.
Aaronaught
@Aaronaught: Except the C# version is looping to Sqrt(i) and the c++ version is looping to i.
Billy ONeal
@BillyONeal: Completely true, for large values of `n` the C++ version will always be slower because it loops more. What I meant was that the benchmark has been made sensitive to `n` and it shouldn't be.
Aaronaught
@Aaronaught: I doubt C# will recompute it. Nothing in the loop will change it. Contrarily, it should be cached in C++ explicitly.
GMan
@Aaronaught: True, but I think the example data size (1,000,000) qualifies as "large N" in this case :)
Billy ONeal
@GMan: *You* may know that nothing in the loop will change it, but the C# compiler cannot make that assumption. It *will* recompute it on every iteration. If you don't believe me, try it out.
Aaronaught
@Aaronaught: I find that surprising. At least in C++, if the definition of `sqrt` were visible it would never recalculate it.
GMan
@GMan: That is why I brought the point up; any reliance on specific compiler optimizations may skew the results. It's important to have a fair test.
Aaronaught
+2  A: 

First, never do such benchmarks in debug mode. To get meaningful numbers always use release mode.

The JIT has the advantage of knowing the platform it runs on, while precompiled code may be suboptimal for the platform it is running on.

Lucero
Maybe, but there is of course the overhead of actually doing the compilation on that platform.
Billy ONeal
+6  A: 

Why would you assume that jitted code is slower than native code? The only speed penalty would be the actual jitting, which only happens once (generally speaking). Given a program with a 30-second running time, we are talking about a minuscule portion of the total cost.

I think you may be confusing jitted code with interpreted code, which is compiled line-by-line. There's a pretty significant difference between the two.

As others have pointed out, you also need to run this in release mode; debug mode turns off most optimizations, so both versions will be slower than they should be (but by different amounts).

Edit - I should point out one other thing, which is that this line:

for (j = 2; j <= Math.Sqrt(i); j++)

Is incredibly inefficient and may interfere with the benchmarking. You should be calculating Math.Sqrt(i) outside of the inner loop. It's possible that this will slow down both versions by an equivalent amount, but I'm not sure, different compilers will perform different optimizations.

Aaronaught
+1. C++ suffers more from the optimizations turned off because it relies on inlining to be performant. Also, the JIT is going to perform some optimizations on C# even if the binary was built in debug mode.
Billy ONeal
Not only does it turn off most optimizations, it may add a bunch of extraneous error checking. Visual Studio, for example, adds overflow checking, heap checking, and various other exciting things in the default debug mode configuration.
kibibu
+6  A: 

It's taking so much longer because the algorithm is wrong.

for(j = 2; j < (i^(1/2)); j++) 

is the same as

for(j = 2; j < (i^0); j++) 

is the same as

for(j = 2; j < i; j++) 

i is a lot bigger than sqrt(i). Looking at just running time, it's an order of magnitude larger that it should be in the C++ implementation.

Also, like everybody else is saying, I don't think it makes sense to do performance testing in debug mode.

Chris
I've corrected the mistake, I'm sorry. I'll try to run them in release mode too.
Mack
A: 

Both tests are invalid because you've compiled without optimizations.

The first test is meaningless even as a comparrison of unoptimized behaviour, because of an error in your code; Math.Sqrt(i) returns the square root of i, i^(1/2) returns i - so the C++ is doing much more work than the C#.

More generally, this isn't a useful thing to do - you're trying to create a synthetic benchmark that has little to no relevence to real world usage.

Joe Gauterin
+2  A: 

It is a persistent myth that the JIT compiler in managed code generates machine code that is a lot less efficient than the one generated by a C/C++ compiler. Managed code usually wins on memory management and floating point math, C/C++ usually wins when the code optimizer can spend a lot more time optimizing code. In general, managed code is about 80% but it completely depends on the 10% of the code where a program spends 90% of its time.

Your test won't show this, you didn't enable the optimizer and there isn't much to optimize.

Hans Passant
"Managed code usually wins on memory management" <-- Because C/C++ programmers are typically lazy and just use giant buffers instead of something like a `std::vector<t>` which is common in JIT'd languages. The speeds of both the compiled code and the JIT'd code should be about the same, all said and done, so long as you discount the time it takes to load the jitter into memory (which can be quite large in Java's case :P ) +1
Billy ONeal
The managed code can hardly win on floating point operations because the current .Net JIT doesn't emit any SSE instructions.
Jasper Bekkers
There's more than one, the x64 one does. Example: http://stackoverflow.com/questions/686483/c-vs-c-big-performance-difference/687741#687741
Hans Passant
It seems that x64 JIT is a totally different when it comes to conception: " The 32-bit JIT and the 64-bit JIT were implemented by two different teams within Microsoft using two different code bases. The 32-bit JIT was developed by the CLR team, whereas the 64-bit JIT was developed by the Visual C++ team and is based on the Visual C++ code base. Because the 64-bit JIT was developed by the C++ team, it is more aware of C++-related issues."
Mack
Yes, the x86 JITter dates from ~1998. Was SSE2 around then? x64 dates from ~2003 and is indeed done by a different team. Not the C++ team. It never generates code from a C++ program, it generates code from IL.
Hans Passant
+1  A: 

guys, before comparing program speed one to another please bother to read several articles on cpu instructions, assembly, cache management etc. And the author is just a ridiculously funny buddy. Checking the performance of a debug builds.

Billy O'Neal - what is the difference between allocating a big buffer and using only small part of it and using dynamically allocated thing like vector in low language words ? Once big buffer been allocated - nobody bothers about the unused stuff. No further support operations needed. While for dynamic things like vector - constant checking of memory bounds required no to outrun of it. Remember, C++ programmers are not just lazy (which is quire true I admit), but they're also smart.

Alexander Solonsky
A: 

How about this:

for(sqrti = 1; sqrti <= 11112; sqrti++) {
  int nexti = (1+sqrti)*(1+sqrti)
  for (i = sqrti*sqrti; i < nexti; i++)
  {
    int isprime = 1;
    for(j = 2; j < sqrti; j++) 
        if(!(i%j)) {
      isprime = 0;
      break;
    }
  }

} countprimes+= isprime; }

Chris