views:

208

answers:

4

I'm putting together a custom SynchronizedCollection<T> class so that I can have a synchronized Observable collection for my WPF application. The synchronization is provided via a ReaderWriterLockSlim, which, for the most part, has been easy to apply. The case I'm having trouble with is how to provide thread-safe enumeration of the collection. I've created a custom IEnumerator<T> nested class that looks like this:

    private class SynchronizedEnumerator : IEnumerator<T>
    {
        private SynchronizedCollection<T> _collection;
        private int _currentIndex;

        internal SynchronizedEnumerator(SynchronizedCollection<T> collection)
        {
            _collection = collection;
            _collection._lock.EnterReadLock();
            _currentIndex = -1;
        }

        #region IEnumerator<T> Members

        public T Current { get; private set;}

        #endregion

        #region IDisposable Members

        public void Dispose()
        {
            var collection = _collection;
            if (collection != null)
                collection._lock.ExitReadLock();

            _collection = null;
        }

        #endregion

        #region IEnumerator Members

        object System.Collections.IEnumerator.Current
        {
            get { return Current; }
        }

        public bool MoveNext()
        {
            var collection = _collection;
            if (collection == null)
                throw new ObjectDisposedException("SynchronizedEnumerator");

            _currentIndex++;
            if (_currentIndex >= collection.Count)
            {
                Current = default(T);
                return false;
            }

            Current = collection[_currentIndex];
            return true;
        }

        public void Reset()
        {
            if (_collection == null)
                throw new ObjectDisposedException("SynchronizedEnumerator");

            _currentIndex = -1;
            Current = default(T);
        }

        #endregion
    }

My concern, however, is that if the Enumerator is not Disposed, the lock will never be released. In most use cases, this is not a problem, as foreach should properly call Dispose. It could be a problem, however, if a consumer retrieves an explicit Enumerator instance. Is my only option to document the class with a caveat implementer reminding the consumer to call Dispose if using the Enumerator explicitly or is there a way to safely release the lock during finalization? I'm thinking not, since the finalizer doesn't even run on the same thread, but I was curious if there other ways to improve this.


EDIT

After thinking about this a bit and reading the responses (particular thanks to Hans), I've decided this is definitely a bad idea. The biggest issue actually isn't forgetting to Dispose, but rather a leisurely consumer creating deadlock while enumerating. I now only read-lock long enough to get a copy and return the enumerator for the copy.

+1  A: 

In the must used IDisposable implementation, you create a protected Dispose(bool managed) method which always disposes the unmanaged resources you use. By calling your protected Dispose(false) method from your finalizer, you'll dispose the lock as required. The lock is managed, do you'll only dispose it when Dispose(true) is called, where true means that managed objects need to be disposed. Otherwise, when the public Dispose() is called explicitly, it calls the protected Dispose(true) and also GC.SuppressFinalize(this) to prevent the finalizer from running (because there is nothing to dispose of anymore).

Because you never know when the user is done with the enumerator, you have no other option than documenting that the user has to dispose the object. You might want to propose that the user uses a using(){ ... } construction, which automatically disposes the object when done.

Virtlink
I'm pretty sure I can't safely release the lock in the finalizer. Perhaps I should release the lock when the enumeration reaches the end of the collection in MoveNext and reacquire it if the consumer calls Reset? This would at least guarantee it's released if the consumer enumerates all the way through, but forgets to call Dispose.
Dan Bryant
Any instance of a class implementing IDisposable should be disposed of by the user appropriately. Failing to do so should be seen as an error.
Eric
I think your suggestion would work. However, when I enumerate half the collection, I can still forget to call `Dispose()`.
Virtlink
+1 for the corrected pattern. Typically, though, I don't bother with the full Dispose pattern if I don't have a finalizer and the class is sealed. There's no need to tell the GC to suppress my finalization if I can guarantee that I will never have a finalizer.
Dan Bryant
+1  A: 

I had to do this recently. The way I did it was to abstract it so that there is an inner object (reference) that contains both the actual list/array and the count (and a GetEnumerator() implementation; then I can do lock-free, thread-safe enumeration by having:

public IEnumerator<T> GetEnumerator() { return inner.GetEnumerator();}

The Add etc need to be synchronized, but they change the inner reference (since reference updates are atomic, you don't need to synchronize GetEnumerator()). This then means that any enumerator will return as many items as were there when the enumerator was created.

Of course, it helps that my scenario was simple, and my list was Add only... if you need to support mutate / remove then it is much trickier.

Marc Gravell
I was confused until I read your last bit. Unfortunately my collection does need to mutate and allow removal. This is intended as a generic replacement for ObservableCollection that synchronizes thread access and guarantees dispatching of INotifyCollectionChanged callbacks on the UI thread (technically on the SynchronizationContext of whoever created the collection class.)
Dan Bryant
@Dan - fair enough. I thought it was worth mentioning in case you could use a simple approach.
Marc Gravell
+2  A: 

This seems far too error-prone to me. It encourages situations in which a lock is implicitly/silently taken out, in a way that is not clear to the reader of the code, and makes it likely that a crucial fact about the interface will be misunderstood.

Usually it's a good idea to duplicate common patterns - represent an enumerable collection with IEnumerable<T> that is disposed when you're done with it - but the added ingredient of taking out a lock makes a big difference, unfortunately.

I'd suggest the ideal approach would be to not offer enumeration on collections shared between threads at all. Try to design the whole system so it isn't needed. Obviously this is going to be a crazy pipe-dream sometimes.

So the next best thing would be to define a context within which an IEnumerable<T> is temporarily available, while the lock exists:

public class SomeCollection<T>
{
    // ...

    public void EnumerateInLock(Action<IEnumerable<T>> action) ...

    // ...
}

That is, when the user of this collection wants to enumerate it, they do this:

someCollection.EnumerateInLock(e =>
    {
        foreach (var item in e)
        {
            // blah
        }
    });

This makes the lifetime of the lock explicitly stated by a scope (represented by the lambda body, working much like a lock statement), and impossible to extend accidentally by forgetting to dispose. It's impossible to abuse this interface.

The implementation of the EnumerateInLock method would be like this:

public void EnumerateInLock(Action<IEnumerable<T>> action)
{
    var e = new EnumeratorImpl(this);

    try
    {
        _lock.EnterReadLock();
        action(e);
    }
    finally
    {
        e.Dispose();
        _lock.ExitReadLock();
    }
}

Notice how the EnumeratorImpl (which needs no particular locking code of its own) is always disposed before the lock is exited. After disposal, it throws ObjectDisposedException in response to any method call (other than Dispose, which is ignored.)

This means that even if there is an attempt to abuse the interface:

IEnumerable<C> keepForLater = null;
someCollection.EnumerateInLock(e => keepForLater = e);

foreach (var item in keepForLater)
{
    // aha!
}

This will always throw, rather than failing mysteriously sometimes based on the timing.

Using a method that accepts a delegate like this is a general technique for managing resource lifetimes commonly used in Lisp and other dynamic languages, and while it is less flexible than implementing IDisposable, that reduced flexibility is often a blessing: it removes the concern over clients "forgetting to dispose".

Update

From your comment, I see that you need to be able to hand a reference to the collection to an existing UI framework, which will therefore expect to be able to use the normal interface to a collection, i.e. directly get an IEnumerable<T> from it and be trusted to clean it up quickly. In which case, why worry? Trust the UI framework to update the UI and dispose the collection rapidly.

Your only other realistic option is simply to make a copy of the collection when the enumerator is requested. That way, the lock only needs to be held when the copy is being made. As soon as it's ready, the lock is released. This may be more efficient if the collections are usually small, so the overhead of the copy is less than the performance saving due to shorter locks.

It's tempting (for about a nanosecond) to suggest that you use a simple rule: if the collection is smaller than some threshold, make the copy, otherwise do it in your original way; choose the implementation dynamically. That way you get the optimum performance - set the threshold (by experiment) such that the copy is cheaper than holding the lock. However, I'd always think twice (or a billion times) about such "clever" ideas in threaded code, because what if there is an abuse of the enumerator somewhere? If you forget to dispose it, you won't see a problem unless it's a large collection... A recipe for hidden bugs. Don't go there!

Another potential drawback with the "expose a copy" approach is that clients will undoubtedly fall under the assumption that if an item is in the collection it is exposed to the world, but as soon as it is removed from the collection it is safely hidden. This will now be wrong! The UI thread will obtain an enumerator, then my background thread will remove the last item from it, and then begin mutating it in the mistaken belief that, because it was removed, no one else can see it.

So the copying approach requires every item on the collection to effectively have its own synchronization, where most coders will assume that they can shortcut this by using the collection's synchronization instead.

Daniel Earwicker
While this ensures that the lock will be eventually released, I would still consider this problematic, because it calls unknown code while the lock is being held.You don't know how long this unknown code takes, or whether it may block.
oefe
I like this solution, but the design goal for this class is specifically to interoperate with data-binding in the UI in safe way, so that various threads can mutate a collection and the UI will reflect them appropriately. Since the UI will always access the collection through IEnumerable, it's a firm requirement that the enumeration be synchronized. The UI won't always enumerate (intelligent handling of INotifyCollectionChanged doesn't require re-enumerating), but it can, which means I need to synchronize.
Dan Bryant
@oefe - compare with the `lock` statement: it also calls "unknown code" (the block that follows it) while the lock is held. But it's called `lock`, so you know it takes out a lock. That's a big part of why I chose the name `EnumerateInLock` - to make it clear that the method call, the lock and the enumerator all have the same lifetime.
Daniel Earwicker
@Dan Bryant - see my update; I'd stick with what you have in that case; the standard UI framework better dispose its enumerators correctly, surely! :) And as enumeration should be the rarer cases, it won't be a major impact on performance overall. No point optimising the rarer cases.
Daniel Earwicker
+2  A: 

You are right, that's a problem. The finalizer is useless, it will run far too late to be of any use. The code should have deadlocked heavily before that anyway. Unfortunately, there's no way for you to tell the difference between the a foreach calling your MoveNext/Current members or the client code using them explicitly.

No fix, don't do this. Microsoft didn't do it either, they had plenty of reason to back in .NET 1.x. The only real thread-safe iterator you can make is one that creates a copy of the collection object in the GetEnumerator() method. The iterator getting out of sync with the collection is no joy either though.

Hans Passant
Given that this is designed for UI integration and thus the collection size should not become huge, copying the collection and returning an enumerator for the copy is, I think, the best solution. There is potentially a large memory overhead for a large collection, but the performance may actually be better, since a slow enumerator won't lock up the collection for any writers. In fact, that's a bigger risk than not Disposing; imagine code that does some long-running operation on each element in the collection.
Dan Bryant
@Dan Bryant - I'd strongly advise not using the "return a copy" approach, for the reason given toward the end of my updated answer.
Daniel Earwicker
@Daniel, That puts me at a bit of impasse (bad threading joke). I think the "return a copy" approach is okay for general enumeration; now I'm starting to fret about possible race conditions with my INotifyCollectionChanged callbacks. I'm starting to understand why MS didn't already have a thread-safe ObservableCollection for easy integration with WPF data binding...
Dan Bryant
"Return a copy" is only okay if all users of the collection class understand that they are responsible for ensuring the thread safety of every item in the collection. They cannot use the insert/remove operations to control the visibility of items to other threads. Removing an item from the collection does not make it safe to modify it freely. This would go against most user's assumptions about thread-safe collections. This is exactly the situation with events in .NET: delisting from an event doesn't mean your handler won't be called any more. The firing code is looping through a copy...
Daniel Earwicker