views:

384

answers:

4

Let's take Wes Dyer's approach to function memoization as the starting point:

public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
  var map = new Dictionary<A, R>();
  return a =>
    {
      R value;
      if (map.TryGetValue(a, out value))
        return value;
      value = f(a);
      map.Add(a, value);
      return value;
    };
}

The problem is, when using it from multiple threads, we can get into trouble:

Func<int, int> f = ...
var f1 = f.Memoize();
...
in thread 1:
var y1 = f1(1);
in thread 2:
var y2 = f1(1);
// We may be recalculating f(1) here!

Let's try to avoid this. Locking on map:

public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
  var map = new Dictionary<A, R>();
  return a =>
    {
      R value;
      lock(map) 
      {
        if (map.TryGetValue(a, out value))
          return value;
        value = f(a);
        map.Add(a, value);
      }
        return value;
    };
}

is clearly a horrible idea, because it prevents us from calculating f1 on many different arguments at once. Locking on a won't work if a has a value type (and at any rate is a bad idea, since we don't control a and outside code may lock on it, too).

Here are two options I can think of:

Assuming a Lazy<T> class for lazy evaluation (see here):

public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
  var map = new Dictionary<A, Lazy<R>>();
  return a =>
    {
      Lazy<R> result;
      lock(map) 
      {
        if (!map.TryGetValue(a, out result))
        {  
          result = () => f(a);
          map.Add(a, result);
        }
      }
      return result.Value;
    };
}

Or keeping an additional dictionary of objects for synchronization:

public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
  var map = new Dictionary<A, R>();
  var mapSync = new Dictionary<A, object>();
  return a =>
    {
      R value;
      object sync;
      lock(mapSync)
      { 
        if (!mapSync.TryGetValue(a, out sync))
        { 
          sync = new object();
          mapSync[a] = sync;
        }
      }
      lock(map)
      {
        if (map.TryGetValue(a, out value))
          return value;
      }
      lock(sync)
      {
        value = f(a);
        lock(map)
        {
          map[a] = value;
        }
        return value;
      }
    };
}

Any better options?

+1  A: 

No, they are not better options.

The version with the lazy evaluation is pointless as you evaluate it immediately anyway. The version with the synchronisation dictionary doesn't work properly as you are not protecting the map dictionary inside a lock before using it.

The version that you called horrible is actually the best option. You have to protect the map dictionary inside a lock so that only one thread at a time can access it. The dictionary is not thread safe, so if you let one thread read from it while another thread is changing it, you will have problems.

Remember that using lock on the map object doesn't protect the map object in itself, it's only using the map reference as an identifier to keep more than one thread at a time to run the code inside the lock. You have to put all code that accesses the object inside the lock, not just the code that is changing the object.

Guffa
I've fixed the lazy evaluation version.
Alexey Romanov
And the synchronisation dictionary version.
Alexey Romanov
The lazy evaluation version still is pointess as the value is always evaluated immediately. The synchronisation dictionary version still is not safe, as different threads can create objects for the same key, and one will overwrite the other.
Guffa
Lazy evaluation: the threads get the same `Lazy<R>` reference, and thus only one will actually evaluate it. Synchronisation dictionary: how can different threads create objects for the same key under `lock(mapSync)`?
Alexey Romanov
No, more than one thread may actually evaluate the lazy object as it's not synchronised, but that is not the point. Making the object lazy is pointless as it's always evaluated in the next line. In the version with the synchronisation dictionary there is a gap between the lock where you have determined that the object is not in the map and the lock where you create the object. In that gap another thread can determine that it's not in the map and add it to the map. It's a very small gap, but it's there.
Guffa
+8  A: 

If you already have that Lazy<T> type, I assume you're using .net 4.0, so you could also use the ConcurrentDictionary<A,R>:

public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
  var map = new ConcurrentDictionary<A, Lazy<R>>();
  return a =>
    {
      Lazy<R> lazy = new Lazy<R>(() => f(a), LazyExecutionMode.EnsureSingleThreadSafeExecution);
      if(!map.TryAdd(a, lazy))
      {
        return map[a].Value;
      }
      return lazy.Value;
    };
}
Thomas Danecker
A: 

Did you read the comment from Dyer related to thread-safe in the article?

Probably the easiest way to make Memoize thread-safe is to put a lock on map.

This will ensure that the function that is being memoized will only be run once for each set of distinct arguments.

In my example of the RoboRally game, I actually used function memoization to act as "surrogate singleton". It isn't really a singleton since there can be one instance per factory instance (unless the factory is static). But that is exactly what I wanted.

Joe
Yes, that's the _easiest_ way. I specifically said what's bad about it: it stops us from evaluating the function on different arguments concurrently as well.
Alexey Romanov
A: 

You don't want to calculate the same value twice and you want many threads to be able to calculate values and or retrieve values concurrently. To do this you will need to use some sort of condition variable and fine grained locking system.

Heres the idea. when no value is present you put a value into the sync map and then any thread who needs that value will wait for it otherwise you will just grab the current value. this way locking of the map is minimized to querying for values and returning values.

    public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
    {
        var map = new Dictionary<A, R>();
        var mapSync = new Dictionary<A, object>();
        return a =>
        {
            R value;
            object sync = null;
            bool calc = false;
            bool wait = false;
            lock (map)
            {
                if (!map.TryGetValue(a, out value))
                {
                    //its not in the map
                    if (!mapSync.TryGetValue(a, out sync))
                    {
                        //not currently being created
                        sync = new object();
                        mapSync[a] = sync;
                        calc = true;

                    }
                    else
                    {
                        calc = false;
                        wait = true;
                    }
                }
            }
            if(calc)
            {
                lock (sync)
                {
                    value = f(a);
                    lock (map)
                    {
                        map.Add(a, value);
                        mapSync.Remove(a);
                    }
                    Monitor.PulseAll(sync);
                    return value;
                }
            }
            else if (wait)
            {
                lock (sync)
                {
                    while (!map.TryGetValue(a, out value))
                    {
                        Monitor.Wait(sync);
                    }
                    return value;
                }
            }

            lock (map)
            {
                return map[a];
            }

        };
    }

This is just a quick first try but i think it demonstrates the technique. Here you are trading additional memory for speed.

luke