views:

266

answers:

1

I am having some issues involving Parallel for loops and adding to a List. The problem is, the same code may generate different output at different times. I have set up some test code below. In this code, I create a List of 10,000 int values. 1/10th of the values will be 0, 1/10th of the values will be 1, all the way up to 1/10th of the values being 9.

After setting up this List, I setup a Parallel for loop that iterates through the list. If the current number is 0, I add a value to a new List. After the Parallel for loop completes, I output the size of the list. The size should always be 1,000. Most of the time, the correct answer is given. However, I have seen 3 possible incorrect outcomes occur:

  1. The size of the list is less than 1,000
  2. An IndexOutOfRangeException occurs @ doubleList.Add(0.0);
  3. An ArgumentException occurs @ doubleList.Add(0.0);

The message for the ArgumentException given was: Destination array was not long enough. Check destIndex and length, and the array's lower bounds.

What could be causing the errors? Is this a .Net bug? Is there something I can do to prevent this from happening?

Please try the code for yourself. If you do not get an error, try it a few times. Please also note that you probably will not see any errors using a single-core machine.

using System;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace ParallelTest
{
    class Program
    {
        static void Main(string[] args)
        {
            List<int> intList = new List<int>();
            List<double> doubleList = new List<double>();

            for (int i = 0; i < 250; i++)
            {
                intList.Clear();
                doubleList.Clear();

                for (int j = 0; j < 10000; j++)
                {
                    intList.Add(j % 10);
                }

                Parallel.For(0, intList.Count, j =>
                {
                    if (intList[j] == 0)
                    {
                        doubleList.Add(0.0);
                    }
                });

                if (doubleList.Count != 1000)
                {
                    Console.WriteLine("On iteration " + i + ": List size = " + doubleList.Count);
                }
            }

            Console.WriteLine("\nPress any key to exit.");
            Console.ReadKey();
        }
    }
}
+8  A: 

I expect that System.Collections.Generic.List is not thread-safe, meaning that if you try to concurrently Add from two different threads, things will go wrong. Ah yes, it says so in the docs.

You could prevent this from happening in a number of ways:

  • use a collection type that allows threadsafe adds (there are some new ones in .Net 4.0)
  • lock before you add
  • use thread-local storage for the collections, and merge them at the end
  • ...

These are very typical issues you encounter with data parallel code.

Brian
+1. System.Collections.Generic.List is not thread safe. There are thread-safe collections in .NET Framework 4.0
Mitch Wheat
+1 for your third suggestion which might be much faster than the other two.
tster
@tster - yeah, there's even a Parallel.For overload to help with that strategy: http://msdn.microsoft.com/en-us/library/dd783586.aspx
Brian
Brian is correct; the System.Collections.Generic.* classes are not thread-safe. See http://www.codeproject.com/KB/collections/LockingList.aspx for a thread-safe list class, or this one for a queue class: http://www.codeproject.com/Articles/38908/Thread-Safe-Generic-Queue-Class.aspx
Simon Chadwick
All good suggestions, but I'd like to propose a fourth: Don't use `Parallel` at all. There's so little work being done inside the `Parallel` block that the overhead of multiple threads and locking/merging is just going to end up costing more than the single-threaded version.
Aaronaught
@Aaronaught, I'm pretty sure this is just an example from Kevin playing with the parallel stuff. I doubt he has a real business need to add a value to a list for each element in an array he generated randomly which is 0.
tster
tster: You're right that this is just a bad example for parallelization, but Aaronaught has a point that not every loop is a good candidate for it. If the synchronization or task scheduling overhead is more than the task itself, there's no point.
Gabe

related questions