I threw together a quick-and-dirty benchmark in C# using a prime generator as a test. The test generates primes up to a constant limit (I chose 500000) using a simple Sieve of Eratosthenes implementation and repeats the test 800 times, parallelized over a specific number of threads, either using the .NET ThreadPool
or standalone threads.
The test was run on a Quad-Core Q6600 running Windows Vista (x64). This is not using the Task Parallel Library, just simple threads. It was run for the following scenarios:
- Serial execution (no threading)
- 4 threads (i.e. one per core), using the
ThreadPool
- 40 threads using the
ThreadPool
(to test the efficiency of the pool itself)
- 4 standalone threads
- 40 standalone threads, to simulate context-switching pressure
The results were:
Test | Threads | ThreadPool | Time
-----+---------+------------+--------
1 | 1 | False | 00:00:17.9508817
2 | 4 | True | 00:00:05.1382026
3 | 40 | True | 00:00:05.3699521
4 | 4 | False | 00:00:05.2591492
5 | 40 | False | 00:00:05.0976274
Conclusions one can draw from this:
Parallelization isn't perfect (as expected - it never is, no matter the environment), but splitting the load across 4 cores results in about 3.5x more throughput, which is hardly anything to complain about.
There was negligible difference between 4 and 40 threads using the ThreadPool
, which means that no significant expense is incurred with the pool, even when you bombard it with requests.
There was negligible difference between the ThreadPool
and free-threaded versions, which means that the ThreadPool
does not have any significant "constant" expense;
There was negligible difference between the 4-thread and 40-thread free-threaded versions, which means that .NET doesn't perform any worse than one would expect it to with heavy context-switching.
Do we even need a C++ benchmark to compare to? The results are pretty clear: Threads in .NET are not slow. Unless you, the programmer, write poor multi-threading code and end up with resource starvation or lock convoys, you really don't need to worry.
With .NET 4.0 and the TPL and improvements to the ThreadPool
, work-stealing queues and all that cool stuff, you have even more leeway to write "questionable" code and still have it run efficiently. You don't get these features at all from C++.
For reference, here is the test code:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Threading;
namespace ThreadingTest
{
class Program
{
private static int PrimeMax = 500000;
private static int TestRunCount = 800;
static void Main(string[] args)
{
Console.WriteLine("Test | Threads | ThreadPool | Time");
Console.WriteLine("-----+---------+------------+--------");
RunTest(1, 1, false);
RunTest(2, 4, true);
RunTest(3, 40, true);
RunTest(4, 4, false);
RunTest(5, 40, false);
Console.WriteLine("Done!");
Console.ReadLine();
}
static void RunTest(int sequence, int threadCount, bool useThreadPool)
{
TimeSpan duration = Time(() => GeneratePrimes(threadCount, useThreadPool));
Console.WriteLine("{0} | {1} | {2} | {3}",
sequence.ToString().PadRight(4),
threadCount.ToString().PadRight(7),
useThreadPool.ToString().PadRight(10),
duration);
}
static TimeSpan Time(Action action)
{
Stopwatch sw = new Stopwatch();
sw.Start();
action();
sw.Stop();
return sw.Elapsed;
}
static void GeneratePrimes(int threadCount, bool useThreadPool)
{
if (threadCount == 1)
{
TestPrimes(TestRunCount);
return;
}
int testsPerThread = TestRunCount / threadCount;
int remaining = threadCount;
using (ManualResetEvent finishedEvent = new ManualResetEvent(false))
{
for (int i = 0; i < threadCount; i++)
{
Action testAction = () =>
{
TestPrimes(testsPerThread);
if (Interlocked.Decrement(ref remaining) == 0)
{
finishedEvent.Set();
}
};
if (useThreadPool)
{
ThreadPool.QueueUserWorkItem(s => testAction());
}
else
{
ThreadStart ts = new ThreadStart(testAction);
Thread th = new Thread(ts);
th.Start();
}
}
finishedEvent.WaitOne();
}
}
[MethodImpl(MethodImplOptions.NoOptimization)]
static void IteratePrimes(IEnumerable<int> primes)
{
int count = 0;
foreach (int prime in primes) { count++; }
}
static void TestPrimes(int testRuns)
{
for (int t = 0; t < testRuns; t++)
{
var primes = Primes.GenerateUpTo(PrimeMax);
IteratePrimes(primes);
}
}
}
}
And here is the prime generator:
using System;
using System.Collections.Generic;
using System.Linq;
namespace ThreadingTest
{
public class Primes
{
public static IEnumerable<int> GenerateUpTo(int maxValue)
{
if (maxValue < 2)
return Enumerable.Empty<int>();
bool[] primes = new bool[maxValue + 1];
for (int i = 2; i <= maxValue; i++)
primes[i] = true;
for (int i = 2; i < Math.Sqrt(maxValue + 1) + 1; i++)
{
if (primes[i])
{
for (int j = i * i; j <= maxValue; j += i)
primes[j] = false;
}
}
return Enumerable.Range(2, maxValue - 1).Where(i => primes[i]);
}
}
}
If you see any obvious flaws in the test, let me know. Barring any serious problems with the test itself, I think the results speak for themselves, and the message is clear:
Don't listen to anyone who makes overly broad and unqualified statements about how the performance of .NET or any other language/environment is "bad" in some particular area, because they are probably talking out of their... rear ends.