views:

229

answers:

1

I'm trying to optimize a concurrent collection that tries to minimize lock contention for reads. First pass was using a linked list, which allowed me to only lock on writes while many simultaneous reads could continue unblocked. This used a custom IEnumerator to yield the next link value. Once i started comparing iteration over the collection to a plain List<T> i noticed my implementation was about half as fast (for from x in c select x on a collection of 1*m* items, i got 24ms for List<T> and 49ms for my collection).

So i thought i'd use a ReaderWriteLockSlim and sacrifice a little contention on reads so i could use a List<T> as my internal storage. Since I have to capture the read lock on iteration start and release it upon completion, i first did a yield pattern for my IEnumerable, foreaching over the internal List<T>. Now i was getting only 66ms.

I peeked at what List actually does and it uses an internal store of T[] and a custom IEnumerator that moves the index forward and returns the current index value. Now, manually using T[] as storage means a lot more maintenance work, but wth, i'm chasing microseconds.

However even mimicking the IEnumerator moving the index on an array, the best I could do was about ~38ms. So what gives List<T> its secret sauce or alternatively what's a faster implementation for an iterator?

UPDATE: Turns out my main speed culprit was running Debug compile, while List<T> is obviously a Release compile. In release my implementation is still a hair slower than List<T>, altough on mono it's now faster.

One other suggestion i got from a friend is that the BCL is faster because it's in the GAC and therefore can get pre-compiled by the system. Will have to put my test in the GAC to test that theory.

+4  A: 

Acquiring and releasing the lock on each iteration sounds like a bad idea - because if you perform an Add or Remove while you're iterating over the list, that will invalidate the iterator. List<T> certainly wouldn't like that, for example.

Would your use case allow callers to take out a ReaderWriterLockSlim around their whole process of iteration, instead of on a per-item basis? That would be more efficient and more robust. If not, how are you planning to deal with the concurrency issue? If a writer adds an element earlier than the place where I've got to, a simple implementation would return the same element twice. The opposite would happen with a removal - the iterator would skip an element.

Finally, is .NET 4.0 an option? I know there are some highly optimised concurrent collections there...

EDIT: I'm not quite sure what your current situation is in terms of building an iterator by hand, but one thing that you might want to investigate is using a struct for the IEnumerator<T>, and making your collection explicitly declare that it returns that - that's what List<T> does. It does mean using a mutable struct, which makes kittens cry all around the world, but if this is absolutely crucial to performance and you think you can live with the horror, it's at least worth a try.

Jon Skeet
Jon - i updated my post to clarify that i am acquiring the lock at iteration start and releasing it once the iteration is completed. The point of the collection is precisely the issue that a modification would invalidate the iterator, hence my first reader lock-less implementation, and my current consideration of ReaderWriterLockSlim to allow many concurrent readers with cheap locking semantics..NET 4.0 is not an option yet, since this code needs to run on mono.
Arne Claassen
The high-performance collections from .NET 4 were ported to .NET 3.5 by the Reactive Extensions team. "System.Threading, backport of Parallel Extensions for .NET 4 to .NET 3.5 SP1" from the RX Team Blog: http://blogs.msdn.com/rxteam/archive/2009/11/17/release-notes.aspx
280Z28
Oh, and yes that's as of like 6 hours ago lol.
280Z28
Tried struct vs. class and found no difference. Then i found the biggest diff, i.e. my tests were using Debug build (Doh). Pretty much copied List<T> enumerator line by line and now i'm getting 20% slower perf. I guess BCL is built with better optimization than VS Release.
Arne Claassen