views:

607

answers:

6

I'm just looking in to the new .NET 4.0 features. With that, I'm attempting a simple calculation using Parallel.For and a normal for(x;x;x) loop.

However, I'm getting different results about 50% of the time.

long sum = 0;

Parallel.For(1, 10000, y =>
    {
        sum += y;
    }
);

Console.WriteLine(sum.ToString());

sum = 0;

for (int y = 1; y < 10000; y++)
{
   sum += y;
}
Console.WriteLine(sum.ToString());

My guess is that the threads are trying to update "sum" at the same time.
Is there an obvious way around it?

+3  A: 

Incrementing a long isn't an atomic operation.

Eric Mickelsen
Good point, SLaks. @TSS: there are two operations here, the addition and saving the value - you really do need to lock.
Eric Mickelsen
+4  A: 

Your surmise is correct.

When you write sum += y, the runtime does the following:

  1. Read the field onto the stack
  2. Add y to the stack
  3. Write the result back to the field

If two threads read the field at the same time, the change made by the first thread will be overwritten by the second thread.

You need to use Interlocked.Add, which performs the addition as a single atomic operation.

SLaks
See below. The nieve way to use interlocked add simply serialized your loop.
Ade Miller
I would add that best way it so use local variable and after cycle add them to single global, using Interlocked.Add of course.
Andrey
There's example of this in my answer below.
Ade Miller
Joe Duffy's book has a good discussion on what actually happens under the hood on about page 21. http://www.bluebytesoftware.com/books/winconc/winconc_book_resources.html
Ade Miller
you loose the benefit of multi threading this if you take it back to an atomic operation. Ade Miller has a great answer to this question.
Jon
+9  A: 

sum += y; is actually sum = sum + y;. You are getting incorrect results because of the following race condition:

  1. Thread1 reads sum
  2. Thread2 reads sum
  3. Thread1 calculates sum+y1, and stores the result in sum
  4. Thread2 calculates sum+y2, and stores the result in sum

sum is now equal to sum+y2, instead of sum+y1+y2.

BlueRaja - Danny Pflughoeft
+1 for explaining the race condition.
Callum Rogers
+41  A: 

You can't do this. sum is being shared across you parallel threads. You need to make sure that the sum variable is only being accessed by one thread at a time:

// DON'T DO THIS!
Parallel.For(0, data.Count, i =>
{
    Interlocked.Add(ref sum, data[i]);
});

BUT... This is an anti-pattern because you've effectively serialised the loop because each thread will lock on the Interlocked.Add.

What you need to do is add sub totals and merge them at the end like this:

Parallel.For<int>(0, result.Count, () => 0, (i, loop, subtotal) =>
    {
        subtotal += result[i];
        return subtotal;
    },
    (x) => Interlocked.Add(ref sum, x)
);

You can find further discussion of this on MSDN: http://msdn.microsoft.com/en-us/library/dd460703.aspx

PLUG: You can find more on this in Chapter 2 on A Guide to Parallel Programming

The following is also definitely worth a read...

Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4 - Stephen Toub

Ade Miller
awesome answer!
Andrey
Where can I find the exact explanation of the overload you used in this answer?
Alex Bagnolini
@Alex. You can find further discussion of it here: http://msdn.microsoft.com/en-us/library/dd460703.aspx. I updated the answer with the same link.
Ade Miller
+3  A: 

I think it's important to distinguish that this loop is not capable of being partitioned for parallelism, because as has been mentioned above each iteration of the loop is dependent on the prior. The parallel for is designed for explicitly parallel tasks, such as pixel scaling etc. because each iteration of the loop cannot have data dependencies outside its iteration.

Parallel.For(0, input.length, x =>
{
    output[x] = input[x] * scalingFactor;
});

The above an example of code that allows for easy partitioning for parallelism. However a word of warning, parallelism comes with a cost, even the loop I used as an example above is far far too simple to bother with a parallel for because the set up time takes longer than the time saved via parallelism.

Mgetz
You can partition it for parallelism, you just have to think about it in a different way, aggregation.
Ade Miller
true... MPI_AllGather() would be a good example, however some cursory research on MSDN shows that you would have to turn to MPI# to get that functionality... as it doesn't seem to be included. You could however code your own.
Mgetz
A: 

if there are two parameters in this code. For example

long sum1 = 0; long sum2 = 0;

Parallel.For(1, 10000, y => { sum1 += y; sum2=sum1*y; } );

what will we do ? i am guessing that have to use array !