views:

104

answers:

3

I have a foreach loop that I am parallelizing and I noticed something odd. The code looks like

double sum = 0.0;

Parallel.ForEach(myCollection, arg =>
{
     sum += ComplicatedFunction(arg);
});

// Use sum variable below

When I use a regular foreach loop I get different results. There may be something deeper down inside the ComplicatedFunction but it is possible that the sum variable is being unexpectantly affected by the parallelization?

+8  A: 

You don't protect access to the sum variable. So you have a race-condition and the result is unpredictable. A variable int sum would have been atomic but a double sum is not. And the addition sum += ... is never thread-safe.

You could use something like:

double sum = myCollection.AsParallel().Sum(arg => ComplicatedFunction(arg));
Henk Holterman
This worked. Nice clean implementation using LINQ.
Brian Triplett
+4  A: 

If you think about that sum += ComplicatedFunctionas being actually composed of a bunch of operations, say:

r1 <- Load current value of sum
r2 <- ComplicatedFunction(...)
r1 <- r1 + r2

So now we randomly interleave two (or more) parallel instances of this. One thread may be holding a stale "old value" of sum which it uses to perform its computation, the result of which it writes back over top of some modified version of sum. It's a classic race condition, because some results are getting lost in a nondeterministic way based on how the interleaving is done.

Gian
You are right, but the situation is in fact much worse than you state. It's not just that the load, calculate and store operations are not atomic. Accessing the *bits* in a double is not even guaranteed to be atomic! The C# specification only guarantees that accessing 32 bit (and smaller) numeric types and references are atomic. Doubles are 64 bit and therefore not guaranteed to be atomic. This program could be realized as: r1 <-- load top 32 bits of sum, r1 <-- load bottom 32 bits of sum... meaning that the operations could be interleaved while *half* of the double has been copied.
Eric Lippert
Well stated. I figured for simplicity in the example I would just assume atomicity of the basic operations, but obviously as you point out, the worst case is even more utterly horrible.
Gian
+2  A: 

Like the others answers mentioned, updating the sum variable from multiple threads (which is what Parallel.ForEach does) is not a thread-safe operation. The trivial fix of acquiring a lock before doing the update will fix that problem.

double sum = 0.0;
Parallel.ForEach(myCollection, arg => 
{ 
  lock (myCollection)
  {
    sum += ComplicatedFunction(arg);
  }
});

However, that introduces yet another problem. Since the lock is acquired on each iteration then that means the execution of each iteration will be effectively serialized. In other words, it would have been better to just use a plain old foreach loop.

Now, the trick in getting this right is to partition the problem in separate and independent chucks. Fortunately that is super easy to do when all you want to do is sum the result of the iterations because the sum operation is commutative and associative and because the intermediate results of the iterations are independent.

So here is how you do it.

double sum = 0.0;
Parallel.ForEach(myCollection,
    () => // Initializer
    {
        return 0D;
    },
    (item, state, subtotal) => // Loop body
    {
        return subtotal += ComplicatedFunction(item);
    },
    (subtotal) => // Accumulator
    {
        lock (myCollection)
        {
          sum += subtotal;
        }
    });
Brian Gideon
Why are you encouraging reinvention of a very standard wheel?
Novelocrat
@Novelocrat: Sorry, I am not clear on what you are asking. Also, because of the timing can I assume you downvoted this answer? If so which part of the answer is wrong? I double checked the code syntax and the partitioning strategy is a pretty well established practice for doing `Parallel.For` operations, but maybe I missed something that caught your eye.
Brian Gideon
The thing whose implementation you describe is exactly the library function that Henk's answer describes. Furthermore, I strongly suspect that the reduction of each thread's subtotal ('Accumulator') in the library implementation is much more efficient than your lock-based approach.
Novelocrat
Yeah, I prefer Henk's answer as well. In fact, I was one of the upvotes. Just curious...what in this answer makes it more harmful than helpful and what could I change to make it more helpful? Keep in mind that I would like to preserve my original intent of demonstrating how you might approach the parallelization in more general terms. Otherwise there would not be much point in keeping the answer in light of Henk's post.
Brian Gideon
Oh, and yes, the accumulation of the partitions is more efficient with the PLinq sum operation because no lock or thread synchronization primitives whatsoever are used. That part of the operation occurs on the callers thread synchronously. Unfortunately, `Parallel.For` does not support that same idiom. The lock free accumulation pattern using the `Parallel.For` method would be slightly more complicated and certainly more difficult to explain since it would involve calling `Interlocked.CompareExchange` in a while loop.
Brian Gideon