views:

435

answers:

2

[EDIT]

The new Reactive Framework solves the problem outlined below, using the System.Linq.EnumerableEx.MemoizeAll() extension method.

Internally, MemoizeAll() uses a System.Linq.EnumerableEx.MemoizeAllEnumerable<T> (found in the System.Interactive assembly), which is similar to my ThreadSafeCachedEnumerable<T> (sorta).

Here's an awfully contrived example that prints the contents of an Enumerable (numbers 1-10) very slowly, then quickly prints the contents a second time (because it cached the values):

// Create an Enumerable<int> containing numbers 1-10, using Thread.Sleep() to simulate work
var slowEnum = EnumerableEx.Generate(1, currentNum => (currentNum <= 10), currentNum => currentNum, previousNum => { Thread.Sleep(250); return previousNum + 1; });

// This decorates the slow enumerable with one that will cache each value.
var cachedEnum = slowEnum.MemoizeAll();

// Print the numbers
foreach (var num in cachedEnum.Repeat(2))
{
    Console.WriteLine(num);
}

[/EDIT]

Hello multi-threading gurus,

I created the ThreadSafeCachedEnumerable class intending to increase performance where long running queries where being reused. The idea was to get an enumerator from an IEnumerable and add items to a cache on each call to MoveNext(). The following is my current implementation:

/// <summary>
/// Wraps an IEnumerable&lt;T&gt; and provides a thread-safe means of caching the values."/>
/// </summary>
/// <typeparam name="T"></typeparam>
class ThreadSafeCachedEnumerable<T> : IEnumerable<T>
{
    // An enumerator from the original IEnumerable<T>
    private IEnumerator<T> enumerator;

    // The items we have already cached (from this.enumerator)
    private IList<T> cachedItems = new List<T>();

    public ThreadSafeCachedEnumerable(IEnumerable<T> enumerable)
    {
        this.enumerator = enumerable.GetEnumerator();
    }

    #region IEnumerable<T> Members

    public IEnumerator<T> GetEnumerator()
    {
        // The index into the sequence
        int currentIndex = 0;

        // We will break with yield break 
        while (true)
        {
            // The currentIndex will never be decremented,
            // so we can check without locking first
            if (currentIndex < this.cachedItems.Count)
            {
                var current = this.cachedItems[currentIndex];
                currentIndex += 1;
                yield return current;
            }
            else
            {
                // If !(currentIndex < this.cachedItems.Count),
                // we need to synchronize access to this.enumerator
                lock (enumerator)
                {
                    // See if we have more cached items ...
                    if (currentIndex < this.cachedItems.Count)
                    {
                        var current = this.cachedItems[currentIndex];
                        currentIndex += 1;
                        yield return current;
                    }
                    else
                    {
                        // ... otherwise, we'll need to get the next item from this.enumerator.MoveNext()
                        if (this.enumerator.MoveNext())
                        {
                            // capture the current item and cache it, then increment the currentIndex
                            var current = this.enumerator.Current;
                            this.cachedItems.Add(current);
                            currentIndex += 1;
                            yield return current;
                        }
                        else
                        {
                            // We reached the end of the enumerator - we're done
                            yield break;
                        }
                    }
                }
            }
        }
    }

    #endregion

    #region IEnumerable Members

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }

    #endregion
}


I simply "lock (this.enumerator)" when the no more items appear to be in the cache, just in case another thread is just about to add another item (I assume that calling MoveNext() on this.enumerator from two threads is a bad idea).

The performance is great when retrieving previously cached items, but it starts to suffer when getting many items for the first time (due to the constant locking). Any suggestions for increasing the performance?

Thanks!

+5  A: 

A couple of recommendations:

  1. It is now generally accepted practice not to make container classes responsible for locking. Someone calling your cached enumerator, for instance, might also want to prevent new entries from being added to the container while enumerating, which means that locking would occur twice. Therefore, it's best to defer that responsibility to the caller.
  2. Your caching depends on the enumerator always returning items in-order, which is not guaranteed. It's better to use a Dictionary or HashSet. Similarly, items may be removed inbetween calls, invalidating the cache.
  3. It is generally not recommended to establish locks on publically accessible objects. That includes the wrapped enumerator. Exceptions are conceivable, for example when you're absolutely certain you're absolutely certain you're the only instance holding a reference to the container class you're enumerating over. This would also largely invalidate my objections under #2.
Michiel Buddingh'
+1 that "It is now generally accepted practice not to make container classes responsible for locking". I wish everyone got that memo.
Frank Schwieterman
Those are nice recommendations - here are my thoughts: 1) That would add too much complexity for my purposes. I just want to cache the results of expensive LINQ projections, where I may only need a few of the results. A lazy loaded list, essentially, only without being able to access an element by index. I'd agree with you for regular containers, though. 2) For my purposes, that should be understood by the caller as a caveat. 3) I can't imagine an IEnumerable holding a reference to one of its IEnumerators, but I suppose it's a possibility. Thanks for the suggestions.
Charles
Generally accepted practice, eh? Does that explain the new `System.Collections.Concurrent` namespace in .NET 4.0?
Dan Tao
Haven't studied 4.0 in any detail, but on the face of it, that new namespace seems like a nice indication that `thread-safe' containers are the exception, not the rule. But you're probably right that 'generally accepted practice' is a bold expression.
Michiel Buddingh'
+2  A: 

Locking in .NET is normally very quick (if there is no contention). Has profiling identified locking as the source of the performance problem? How long does it take to call MoveNext on the underlying enumerator?

Additionally, the code as it stands is not thread-safe. You cannot safely call this.cachedItems[currentIndex] on one thread (in if (currentIndex < this.cachedItems.Count)) while invoking this.cachedItems.Add(current) on another. From the List(T) documentation: "A List(T) can support multiple readers concurrently, as long as the collection is not modified." To be thread-safe, you would need to protect all access to this.cachedItems with a lock (if there's any chance that one or more threads could modify it).

Bradley Grainger
That's a valid point, but do you know if an exception could be thrown? The idea behind not locking was that I could count on the list only getting larger and the index getting larger, so I could bypass the need for locking if (currentIndex < this.cachedItems.Count) - else, I could lock and double check it. I played around with profiling it using a Stopwatch - iterating through 20 million items the first time adds another 2 seconds (it's negligable I, suppose).
Charles
The main thing I would be concerned about is another thread resizing the List's internal array (in Add()) while the reader thread is using the indexer to retrieve an item. It seems possible that it could return default(T) or throw an ArgumentOutOfRangeException. Of course, this all depends on the exact implementation of List<T>, so the best I could say is that the behaviour is "undefined". (Even if Reflector shows you that it would be safe, who knows if it could change in .NET 4.0, introducing a subtle and hard-to-find bug?)
Bradley Grainger