views:

201

answers:

7

I am writing a client-side .NET application which is expected to use a lot of threads. I was warned that .NET performance is very bad when it comes to concurrency. While I am not writing a real-time application, I want to make sure my application is scalable (i.e. allows many threads) and is somehow comparable to an equivalent C++ application.

What is your experience? What is a relevant benchmark?

+7  A: 

This is a myth. .NET does a very good job at managing concurrency, and being very scalable.

If you can, I'd recommend using .NET 4 and the Task Parallel Library. It simplifies many concurrency issues. For details, I'd recommend looking at the MSDN center for Parallel Computing with Managed Code.

If you're interested in details of implementation, I also have a very detailed series on Parallelism in .NET.

Reed Copsey
+9  A: 

You may want to have a look at System.Threading.Tasks introduced in .NET 4.

They introduced a scalable way to use threads with task with some really cool mechanism of job sharing.

By the way I don't know who told you that .NET was not good with concurrency. All of my applications do use threads at some point of another but don't forget that having 10 threads on a 2 core processor is kind of counter productive (depending on the type of task you're making them do. If it's tasks that are waiting for networks ressources then it may make sense).

Anyway, don't fear .NET for performance, it's actually quite good.

NPayette
+4  A: 

.NET performance on concurrency is going to be pretty close to the same as applications written in native code. System.Threading is a very thin layer over the threading API.

Whoever warned you may be noticing that, because multithreaded applications are much easier to write in .NET, they're sometimes being written by less experienced programmers who don't fully understand concurrency, but that's not a technical limitation.

If anecdotal evidence helps, at my last job, we wrote a heavily concurrent trading application that processed over 20,000 market data events per second and updated a massive "main form" grid with the relevant data, all through a fairly massive threading architecture and all in C# and VB.NET. Because of the complexity of the application, we optimized many areas, but never saw an advantage to rewriting the threading code in native C++.

Jekke
+3  A: 

First you should seriously reconsider whether or not you need a lot of threads or just some. It's not that .NET threads are slow. Threads are slow. Task switching is an expensive operation no matter who wrote the algorithm.

This is a place, like many others, where design patterns can help. There are already good answers that touch on this fact, so I'll just make it explicit. You are better off using a command pattern to marshal work into a few worker threads and then getting that work done as quickly as possible in sequence than you are trying to spin up a bunch of threads and do a bunch of work in "parallel" that isn't really being done in parallel but, rather, divided up into little chunks that are woven together by the scheduler.

In other words: you are better off dividing the work into chunks of value using your mind and knowledge to decide where the boundaries between units of value live than you are letting some generic solution like the operating system decide for you.

What about the case where some threads are spending a lot of time waiting for resources? In this case allowing other threads to step in and use the CPU would be advantageous.
Charles
You can solve that problem with a small number of threads and design. I'm not against _any_ threads. I'm against _"lots"_ of threads.
+12  A: 

I threw together a quick-and-dirty benchmark in C# using a prime generator as a test. The test generates primes up to a constant limit (I chose 500000) using a simple Sieve of Eratosthenes implementation and repeats the test 800 times, parallelized over a specific number of threads, either using the .NET ThreadPool or standalone threads.

The test was run on a Quad-Core Q6600 running Windows Vista (x64). This is not using the Task Parallel Library, just simple threads. It was run for the following scenarios:

  • Serial execution (no threading)
  • 4 threads (i.e. one per core), using the ThreadPool
  • 40 threads using the ThreadPool (to test the efficiency of the pool itself)
  • 4 standalone threads
  • 40 standalone threads, to simulate context-switching pressure

The results were:

Test | Threads | ThreadPool | Time
-----+---------+------------+--------
1    | 1       | False      | 00:00:17.9508817
2    | 4       | True       | 00:00:05.1382026
3    | 40      | True       | 00:00:05.3699521
4    | 4       | False      | 00:00:05.2591492
5    | 40      | False      | 00:00:05.0976274

Conclusions one can draw from this:

  • Parallelization isn't perfect (as expected - it never is, no matter the environment), but splitting the load across 4 cores results in about 3.5x more throughput, which is hardly anything to complain about.

  • There was negligible difference between 4 and 40 threads using the ThreadPool, which means that no significant expense is incurred with the pool, even when you bombard it with requests.

  • There was negligible difference between the ThreadPool and free-threaded versions, which means that the ThreadPool does not have any significant "constant" expense;

  • There was negligible difference between the 4-thread and 40-thread free-threaded versions, which means that .NET doesn't perform any worse than one would expect it to with heavy context-switching.

Do we even need a C++ benchmark to compare to? The results are pretty clear: Threads in .NET are not slow. Unless you, the programmer, write poor multi-threading code and end up with resource starvation or lock convoys, you really don't need to worry.

With .NET 4.0 and the TPL and improvements to the ThreadPool, work-stealing queues and all that cool stuff, you have even more leeway to write "questionable" code and still have it run efficiently. You don't get these features at all from C++.

For reference, here is the test code:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Threading;

namespace ThreadingTest
{
    class Program
    {
        private static int PrimeMax = 500000;
        private static int TestRunCount = 800;

        static void Main(string[] args)
        {
            Console.WriteLine("Test | Threads | ThreadPool | Time");
            Console.WriteLine("-----+---------+------------+--------");
            RunTest(1, 1, false);
            RunTest(2, 4, true);
            RunTest(3, 40, true);
            RunTest(4, 4, false);
            RunTest(5, 40, false);
            Console.WriteLine("Done!");
            Console.ReadLine();
        }

        static void RunTest(int sequence, int threadCount, bool useThreadPool)
        {
            TimeSpan duration = Time(() => GeneratePrimes(threadCount, useThreadPool));
            Console.WriteLine("{0} | {1} | {2} | {3}",
                sequence.ToString().PadRight(4),
                threadCount.ToString().PadRight(7),
                useThreadPool.ToString().PadRight(10),
                duration);
        }

        static TimeSpan Time(Action action)
        {
            Stopwatch sw = new Stopwatch();
            sw.Start();
            action();
            sw.Stop();
            return sw.Elapsed;
        }

        static void GeneratePrimes(int threadCount, bool useThreadPool)
        {
            if (threadCount == 1)
            {
                TestPrimes(TestRunCount);
                return;
            }

            int testsPerThread = TestRunCount / threadCount;
            int remaining = threadCount;
            using (ManualResetEvent finishedEvent = new ManualResetEvent(false))
            {
                for (int i = 0; i < threadCount; i++)
                {
                    Action testAction = () =>
                    {
                        TestPrimes(testsPerThread);
                        if (Interlocked.Decrement(ref remaining) == 0)
                        {
                            finishedEvent.Set();
                        }
                    };

                    if (useThreadPool)
                    {
                        ThreadPool.QueueUserWorkItem(s => testAction());
                    }
                    else
                    {
                        ThreadStart ts = new ThreadStart(testAction);
                        Thread th = new Thread(ts);
                        th.Start();
                    }
                }
                finishedEvent.WaitOne();
            }
        }

        [MethodImpl(MethodImplOptions.NoOptimization)]
        static void IteratePrimes(IEnumerable<int> primes)
        {
            int count = 0;
            foreach (int prime in primes) { count++; }
        }

        static void TestPrimes(int testRuns)
        {
            for (int t = 0; t < testRuns; t++)
            {
                var primes = Primes.GenerateUpTo(PrimeMax);
                IteratePrimes(primes);
            }
        }
    }
}

And here is the prime generator:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ThreadingTest
{
    public class Primes
    {
        public static IEnumerable<int> GenerateUpTo(int maxValue)
        {
            if (maxValue < 2)
                return Enumerable.Empty<int>();

            bool[] primes = new bool[maxValue + 1];
            for (int i = 2; i <= maxValue; i++)
                primes[i] = true;

            for (int i = 2; i < Math.Sqrt(maxValue + 1) + 1; i++)
            {
                if (primes[i])
                {
                    for (int j = i * i; j <= maxValue; j += i)
                        primes[j] = false;
                }
            }

            return Enumerable.Range(2, maxValue - 1).Where(i => primes[i]);
        }
    }
}

If you see any obvious flaws in the test, let me know. Barring any serious problems with the test itself, I think the results speak for themselves, and the message is clear:

Don't listen to anyone who makes overly broad and unqualified statements about how the performance of .NET or any other language/environment is "bad" in some particular area, because they are probably talking out of their... rear ends.

Aaronaught
The ending kinda tells it all...:) +1And I wish I could give other ones for actually taking the time and implementing a benchmark +1 (for an oh-so-meaningless debate +1).
andras
A: 

What a very nice series of answers. This is one of the best threads here on this forums so far!

Turing Complete