views:

617

answers:

5

I have a Parallel.ForEach loop running an intensive operation inside the body.

The operation can use a Hashtable to store the values, and can be reused for other consecutive loop items. I add to the Hashtable after the intensive operation is complete, the next loop item can look up in the Hashtable and reuse the object, instead of running the intensive operation again.

However, because I am using Parallel.ForEach there is a unsafe issue, causing the Hashtable.Add and the ContainsKey(key) calls go out of sync, as they might be running in parallel. Introducing locks may cause perf issues.

Here's the sample code:

Hashtable myTable = new Hashtable;
Parallel.ForEach(items, (item, loopState) =>
{
    // If exists in myTable use it, else add to hashtable
    if(myTable.ContainsKey(item.Key))
    {
       myObj = myTable[item.Key];
    }
    else
    {
       myObj = SomeIntensiveOperation();
       myTable.Add(item.Key, myObj); // Issue is here : breaks with exc during runtime
    }
    // Do something with myObj
    // some code here
}

There must be some API, Property setting inside TPL library, that could handle this scenario. Is there?

+1  A: 

I see no other correct choice than to use (more or less explicit) locks (A synchronized Hashtable just overrides all methods with locks).

Another option could be to allow the dictionary to go out of sync. The race condition will not corrupt the dictionary, it will just require the code to do some superfluous computations. Profile the code to check whether the lock or missing memoization has worse effects.

Dario
+8  A: 

You're looking for System.Collections.Concurrent.ConcurrentDictionary<TKey, TValue>. The new concurrent collections use significantly improved locking mechanisms and should perform excellectly in parallel algorithms.

Edit: The result might look like this:

ConcurrentDictionary<T,K> cache = ...;
Parallel.ForEach(items, (item, loopState) =>
{
    K value;
    if (!cache.TryGetValue(item.Key, out value))
    {
        value = SomeIntensiveOperation();
        cache.TryAdd(item.Key, value);
    }

    // Do something with value
} );

Word of warning: if the elements in items do not all have unique item.Key, then SomeIntensiveOperation could get called twice for that key. In the example, the key isn't passed to SomeIntensiveOperation, but it means that the "Do something with value" code could execute key/valueA and key/valueB pairs, and only one result would get stored in the cache (not necessarily the first one computed by SomeIntensiveOperation either). You'd need a parallel lazy factory to handle this if it's a problem. Also, for obvious reasons SomeIntensiveOperation should be thread safe.

280Z28
@AdamRalph : since he is using TPL library he is already using .net 4.0
Yassir
280Z28
Yup Thanks for the answers and comments
Vin
+1  A: 

Use a ReaderWriterLock, this has good performance for work that has many reads and few writes that are of a short duration. Your problem seems to fit this specification.

All read operations will run quickly and lock free, the only time anyone will be blocked is when a write is happening, and that write is only as long as it takes to shove something in a Hashtable.

ReaderWriterLockSlim on MSDN

I guess I'll throw down some code...

ReaderWriterLockSlim cacheLock = new ReaderWriterLockSlim();
Hashtable myTable = new Hashtable();
Parallel.ForEach(items, (item, loopState) =>
{
    cacheLock.EnterReadLock();
    MyObject myObj = myTable.TryGet(item.Key);
    cacheLock.ExitReadLock();

    // If the object isn't cached, calculate it and cache it
    if(myObj == null)
    {
       myObj = SomeIntensiveOperation();
       cacheLock.EnterWriteLock();
       try
       {
           myTable.Add(item.Key, myObj);
       }
       finally
       {
           cacheLock.ExitWriteLock();
       }           
    }
    // Do something with myObj
    // some code here
}

static object TryGet(this Hashtable table, object key)
{
    if(table.Contains(key))
        return table[key]
    else
        return null;
}
joshperry
"The .NET Framework has two reader-writer locks, ReaderWriterLockSlim and ReaderWriterLock. ReaderWriterLockSlim is recommended for all new development. ReaderWriterLockSlim is similar to ReaderWriterLock, but it has simplified rules for recursion and for upgrading and downgrading lock state. ReaderWriterLockSlim avoids many cases of potential deadlock. In addition, the performance of ReaderWriterLockSlim is significantly better than ReaderWriterLock."
280Z28
That advice seems sound, so I've updated my answer. For those interested take a look at this MSDN magazine article: http://msdn2.microsoft.com/en-us/magazine/cc163599.aspx
joshperry
+4  A: 

check the System.Collections.Concurrent namespace i think you need ConcurrentDictionary

Yassir
A: 

@280Z28:

I use your snippet code but it doesn't work. It seems that I have a dead-lock on Namespaces.TryGetValue(@namespace, out values) function. Do you know what's my wrong?

Here is the code:

private static ConcurrentDictionary<string, HashSet<string>> Namespaces;
private static string[] NamespaceKeys;

private static readonly int ProcessorCount;

static ProgrammingSyntaxHighlighter()
{
  ProcessorCount = System.Environment.ProcessorCount;

  Assembly[] assemblies = AppDomain.CurrentDomain.GetAssemblies();
  Namespaces = new ConcurrentDictionary<string, HashSet<string>>(ProcessorCount << 1, assemblies.Length);

  Parallel.For(0, assemblies.Length, (i, loopState) =>
  {
    Assembly assembly = assemblies[i];
    Type[] types = null;
    try
    {
      types = assembly.GetTypes();
    }
    catch
    {
    }

    if (types != null)
    {
      //First, try to get all public types in thread local
      HashSet<string> typeNamespaces = new HashSet<string>();
      foreach (Type type in types)
      {
        if (type.IsPublic)
        {
          string @namespace = type.Namespace;
          if (@namespace == null)
            @namespace = string.Empty;

          if (!typeNamespaces.Contains(@namespace))
          {
            typeNamespaces.Add(@namespace);
          }
        }
      }

      HashSet<string> values;
      string assemblyFullName = assembly.FullName;
      foreach (string @namespace in typeNamespaces)
      {
        if (Namespaces.TryGetValue(@namespace, out values))
        {
          if (!values.Contains(assemblyFullName))
            values.Add(assemblyFullName);
        }
        else
        {
          values = new HashSet<string>();
          values.Add(assemblyFullName);
          Namespaces.TryAdd(@namespace, values);
        }
      }
    }
  });
  //Get keys from namespaces
  NamespaceKeys = new string[Namespaces.Count];
  Namespaces.Keys.CopyTo(NamespaceKeys, 0);
}
White Rose