views:

83

answers:

2

I've been poking around mscorlib to see how the generic collection optimized their enumerators and I stumbled on this:

// in List<T>.Enumerator<T>
public bool MoveNext()
{
    List<T> list = this.list;
    if ((this.version == list._version) && (this.index < list._size))
    {
        this.current = list._items[this.index];
        this.index++;
        return true;
    }
    return this.MoveNextRare();
}

The stack size is 3, and the size of the bytecode should be 80 bytes. The naming of the MoveNextRare method got me on my toes and it contains an error case as well as an empty collection case, so obviously this is breaching separation of concern.

I assume the MoveNext method is split this way to optimize stack space and help the JIT, and I'd like to do the same for some of my perf bottlenecks, but without hard data, I don't want my voodoo programming turning into cargo-cult ;)

Thanks! Florian

+3  A: 

If you're going to think about ways in which List<T>.Enumerator is "odd" for the sake of performance, consider this first: it's a mutable struct. Feel free to recoil with horror; I know I do.

Ultimately, I wouldn't start mimicking optimisations from the BCL without benchmarking/profiling what difference they make in your specific application. It may well be appropriate for the BCL but not for you; don't forget that the BCL goes through the whole NGEN-alike service on install. The only way to find out what's appropriate for your application is to measure it.

You say you want to try the same kind of thing for your performance bottlenecks: that suggests you already know the bottlenecks, which suggests you've got some sort of measurement in place. So, try this optimisation and measure it, then see whether the gain in performance is worth the pain of readability/maintenance which goes with it.

There's nothing cargo-culty about trying something and measuring it, then making decisions based on that evidence.

Jon Skeet
I was looking for specs on the JITter, but thanks for pointing-out the mutable struct! I had a look at how the call was resolved and it turns out it isn't boxed unless the List<T> is referenced as an IList<T>. Also, List<T>.Enumerator<T> instances aren't going to be referenced directly then passed around as is, they're bound to be cast to IEnumerator<T> then boxed rather than copied, so we're safe-ish as no-one uses IEnumerator<T> arguments, right?It's a neat and dangerous trick that marries the general use-case around iterators and clr deep magic. Won't reproduce it though :)
Florian Doyon
+1  A: 

Separating it into two functions has some advantages:

If the method were to be inlined, only the fast path would be inlined and the error handling would still be a function call. This prevents inlining from costing too much extra space. But 80 bytes of IL is probably still above the threshold for inlining (it was once documented as 32 bytes, don't know if it's changed since .NET 2.0).

Even if it isn't inlined, the function will be smaller and fit within the CPU's instruction cache more easily, and since the slow path is separate, it won't have to be fetched into cache every time the fast path is.

It may help the CPU branch predictor optimize for the more common path (returning true).

I think that MoveNextRare is always going to return false, but by structuring it like this it becomes a tail call, and if it's private and can only be called from here then the JIT could theoretically build a custom calling convention between these two methods that consists of just a jmp instruction with no prologue and no duplication of epilogue.

Ben Voigt
Didn't think about the JIT overhead. Thanks.
Florian Doyon