I'm trying to optimize a concurrent collection that tries to minimize lock contention for reads. First pass was using a linked list, which allowed me to only lock on writes while many simultaneous reads could continue unblocked. This used a custom IEnumerator
to yield the next link value. Once i started comparing iteration over the collection to a plain List<T>
i noticed my implementation was about half as fast (for from x in c select x
on a collection of 1*m* items, i got 24ms for List<T>
and 49ms for my collection).
So i thought i'd use a ReaderWriteLockSlim
and sacrifice a little contention on reads so i could use a List<T>
as my internal storage. Since I have to capture the read lock on iteration start and release it upon completion, i first did a yield pattern for my IEnumerable
, foreaching over the internal List<T>
. Now i was getting only 66ms.
I peeked at what List actually does and it uses an internal store of T[]
and a custom IEnumerator
that moves the index forward and returns the current index value. Now, manually using T[]
as storage means a lot more maintenance work, but wth, i'm chasing microseconds.
However even mimicking the IEnumerator
moving the index on an array, the best I could do was about ~38ms. So what gives List<T>
its secret sauce or alternatively what's a faster implementation for an iterator?
UPDATE: Turns out my main speed culprit was running Debug compile, while List<T>
is obviously a Release compile. In release my implementation is still a hair slower than List<T>
, altough on mono it's now faster.
One other suggestion i got from a friend is that the BCL is faster because it's in the GAC and therefore can get pre-compiled by the system. Will have to put my test in the GAC to test that theory.