views:

231

answers:

5

I've been testing the performance of System.Threading.Parallel vs a Threading and I'm surprised to see Parallel taking longer to finish tasks than threading. I'm sure it's due to my limited knowledge of Parallel, which I just started reading up on.

I thought i'll share few snippets and if anyone can point out to me paralle code is running slower vs threaded code. Also tried to run the same comparison for finding prime numbers and found parallel code finishing much later than threaded code.

public class ThreadFactory
{
    int workersCount;
    private List<Thread> threads = new List<Thread>();

    public ThreadFactory(int threadCount, int workCount, Action<int, int, string> action)
    {
        workersCount = threadCount;

        int totalWorkLoad = workCount;
        int workLoad = totalWorkLoad / workersCount;
        int extraLoad = totalWorkLoad % workersCount;

        for (int i = 0; i < workersCount; i++)
        {
            int min, max;
            if (i < (workersCount - 1))
            {
                min = (i * workLoad);
                max = ((i * workLoad) + workLoad - 1);
            }
            else
            {
                min = (i * workLoad);
                max = (i * workLoad) + (workLoad - 1 + extraLoad);
            }
            string name = "Working Thread#" + i; 

            Thread worker = new Thread(() => { action(min, max, name); });
            worker.Name = name;
            threads.Add(worker);
        }
    }

    public void StartWorking()
    {
        foreach (Thread thread in threads)
        {
            thread.Start();
        }

        foreach (Thread thread in threads)
        {
            thread.Join();
        }
    }
}

Here is the program:

Stopwatch watch = new Stopwatch();
watch.Start();
int path = 1;

List<int> numbers = new List<int>(Enumerable.Range(0, 10000));

if (path == 1)
{
    Parallel.ForEach(numbers, x =>
    {
        Console.WriteLine(x);
        Thread.Sleep(1);

    });
}
else
{
    ThreadFactory workers = new ThreadFactory(10, numbers.Count, (min, max, text) => {

        for (int i = min; i <= max; i++)
        {
            Console.WriteLine(numbers[i]);
            Thread.Sleep(1);
        }
    });

    workers.StartWorking();
}

watch.Stop();
Console.WriteLine(watch.Elapsed.TotalSeconds.ToString());

Console.ReadLine();

Update:

Taking Locking into consideration: I tried the following snippet. Again the same results, Parallel seems to finish much slower.

path = 1; cieling = 10000000;

    List<int> numbers = new List<int>();

    if (path == 1)
    {
        Parallel.For(0, cieling, x =>
        {
            lock (numbers)
            {
                numbers.Add(x);    
            }

        });
    }

    else
    {
        ThreadFactory workers = new ThreadFactory(10, cieling, (min, max, text) =>
        {

            for (int i = min; i <= max; i++)
            {
                lock (numbers)
                {
                    numbers.Add(i);    
                }                       

            }
        });

        workers.StartWorking();
    }

Update 2: Just a quick update that my machine has Quad Core Processor. So Parallel have 4 cores available.

A: 

There's actually a great answer to a similar question here:

http://stackoverflow.com/questions/1116604/c-migrate-a-single-threaded-app-to-multi-threaded-parallel-execution-monte-car

Sohnee
@Sohnee: thanks for the answer...well i have read those concepts and i do understand that threading not always means faster turn around times, but clearly in my examples above - threading will improve the performance. So i'm just trying to compare the performance of threading vs parallel class. And i can't understand why Parallel is so much slower.
ace
+2  A: 

Refering to a blog post by Reed Copsey Jr:

Parallel.ForEach is a bit more complicated, however. When working with a generic IEnumerable, the number of items required for processing is not known in advance, and must be discovered at runtime. In addition, since we don’t have direct access to each element, the scheduler must enumerate the collection to process it. Since IEnumerable is not thread safe, it must lock on elements as it enumerates, create temporary collections for each chunk to process, and schedule this out.

The locking and copying could make Parallel.ForEach take longer. Also partitioning and the scheduler of ForEach could impact and give overhead. I tested your code and increased the sleep of each task, and then the results are closer, but still ForEach is slower.

[Edit - more research]

I added the following to the execution loops:

if (Thread.CurrentThread.ManagedThreadId > maxThreadId)
   maxThreadId = Thread.CurrentThread.ManagedThreadId;

What this shows on my machine is that it uses 10 threads less with ForEach, compared to the other one with the current settings. If you want more threads out of ForEach, you would have to fiddle around with ParallelOptions and the Scheduler.

See http://stackoverflow.com/questions/1114317/does-parallel-foreach-limits-the-number-of-active-threads

Mikael Svenson
Intresting...let me try the comparison with insertion on a list with lock enabled. Thnx.
ace
Saw your update... and modified my answer a bit.
Mikael Svenson
Edited once more :) It boils down to the number of threads being used.
Mikael Svenson
Thanks for the link...i'll fiddle around more and update with results..
ace
I just read that Reed's blog - the partitioning used in this question is what he calls the most simple and naive partitioning. Which makes it a very good feasibility test. Remember OpenMP? A library has to prove a lot to actually be trusted to do parallelization for anything real.
ZXX
@ZXX : Exactly i just threw at it the most basic task of splitting N items into different threads, and still it didn't work well not sure how complicated scenarios are going to pan out.
ace
+1  A: 

I think I can answer your question. First of all, you didn't write how many cores your system has. if you are running a dual-core, only 4 thread will work using the Parallel.For while you are working with 10 threads in your Thread example. More threads will work better as the task you are running (Printing + Short sleep) is a very short task for threading and the thread overhead is very large compared to the task, I'm almost sure that if you write the same code without threads it will work faster.

Both your methods works pretty much the same but if you create all the threads in advance you save allot as the Parallel.For uses the Task pool which adds some move overhead.

Gilad
+1: the question is a total apples vs. oranges comparison because it uses a different number of threads. Console.WriteLine is also a poor choice for a test case.
Hightechrider
@Gilad: I have a quad core.@Hightechrider I even tested it with finding prime numbers. Same result.Please try my code example and see the results.
ace
Reminder again - the sole promise of Parallel-s is to do scheduling "better than by hand". Without that it's not of much use except as a possible convenience syntax.
ZXX
A: 

The comparison is not very fair in regard to Threading.Parallel. You tell your custom thread pool that it'll need 10 threads. Threading.Parallel does not know how much threads it will need so it tries to adapt at run-time taking into account such things as current CPU load and other things. Since the number of iterations in the test is small enough you can this number of threads adaption penalty. Providing the same hint for Threading.Parallel will make it run much faster:


int workerThreads;
int completionPortThreads;
ThreadPool.GetMinThreads(out workerThreads, out completionPortThreads);
ThreadPool.SetMinThreads(10, completionPortThreads);

Konstantin Oznobihin
Thnx...i'll try that and see if it makes a difference.
ace
Don't forget that the sole promise of Parallel-s is to do scheduling "better than by hand". Great thing for the null case though - one more spoon to feed in :-)
ZXX
A: 

It's logical :-)

That would be the first time in history that addition of one (or two) layers of code improved performance. When you use convenience libraries you should expect to pay the price. BTW you haven't posted the numbers. Got to publish results :-)

To make things a bit more failr (or biased :-) for the Parallel-s, convert the list into array.

Then to make them totally unfair, split the work on your own, make an array of just 10 items and totally spoon feed actions to Parallel. You are of course doing the job that Parallel-s promised to do for you at this point but it's bound to be an interesting number :-)

BTW I just read that Reed's blog. The partitioning used in this question is what he calls the most simple and naive partitioning. Which makes it a very good elimination test indeed. You still need to check the zero work case just to know if it's totally hosed.

ZXX
haha...well if i'm doing all the work then it defeats the point rite?
ace
It gives you the measurement to tell you if Parallel-s can at least be faster when they are not doing any work - think of it as an elimination question :-) It at least that part works out fine then they might be viable as a convenience feature. If not - you discovered early on which dll to avoid.
ZXX
@ZXX : not sure if i'm ready to give up on it just yet, there ought to be better optimization, settings i'm missing while making my parallel calls.
ace
I'd tell you what you are missing but don't want to spoil your fun :-) Scheduling is NP complete problem => any software promising to do it in a general case is straight delusional. Special and well defined categories get served by specialized software (IIS and CUDA scheduler as examples).
ZXX