views:

751

answers:

4

In a project I am working on, there are really huge collections (1M-1B elements), and things are modified as collections mostly.

It's a real-time app, and so the performance is paramount.

For some of the operations, like Reverse, BinarySearch (possible?), etc will suffer more than others like Select, etc.

Is it feasible to implement one's own IEnumerable with possible MoveNext, MovePrev, etc and own implemented LINQ extensions that take advantages of these?

If this is gonna happen, it's gonna happen at the end of the project. Because we need to get it working first, then make it faster.

All in all this shouldn't be too much work, right?

+2  A: 

It's very definitely possible to create your own implementation of Enumerable which might special-case some situations. You'd basically want to detect your own collection types (or possibly just collections such as List<T>) and use a more efficient implementation where applicable.

I have a sample project which I used to demo "implementing LINQ to Objects in an hour" which you might like to look at for examples. It's not a full implementation and in particular it's less efficient than the real LINQ to Objects - but you may still find it interesting.

Alternatively, you may find that i4o (Indexed LINQ) does everything you need out of the box - or that you would be better off contributing to that than starting from scratch. Worth checking out.

Just remember that at the end of the day, LINQ is basically a nice design coupled with syntactic sugar. The C# compiler doesn't know anything special about System.Linq.Enumerable, for example.

Jon Skeet
Thanks Jon. I will check out both projects. You said "C# compiler doesn't know anything special...", but I will lose the query syntax, right? Or will the compiler use mine if the names are matching, like Select and Where?
Joan Venge
The latter. The C# compiler knows about the *pattern* but not the specifics. You could implement the query pattern over a *totally* different type if you wanted to. See chapters 11 and 12 of C# in Depth for more details, if you've got it.
Jon Skeet
Thanks Jon. I sure have it. Didn't start yet, but will take a look at those chapters.
Joan Venge
+1  A: 

If you really want performance, you can do quite a lot. Remember that the following selection:

var result = from element in collection
             where element.Id == id
             select element;

Compiles as:

var result = collection.Where(element => element.Id == id);

If you create the following method for the type of collection, then you can exploit the fact that the primary action is equality of the Id member and handle the request in an optimized manner. The important thing is correctly identifying the performance-critical operations on your collection and choosing the correct algorithms (i.e. complexity) to perform them.

public IEnumerable<TElement> Where(Expression<Func<TElement, bool>> selector)
{
    // detect equality of the Id member and return some special value
}
280Z28
+2  A: 

Consider System.Linq.Enumerable.Reverse() - this method fully enumerates the IEnumerable before returning the first result.

If your query is myCollection.Reverse().Take(10), and your collection has Billions of items, it is a horrible idea to enumerate the billions of items to get 10 out.

If you supplied a Reverse method on your own type, you could supply a better implementation that simply loops backwards over the collection (by index possibly).

The key to this is supplying your own type where you control the implementations. You can't use the implementations that work for all IEnumerable<T> because those implementations won't take full advantage of the capabilities of your custom collection type.

David B
Thanks David, that's a good example I was thinking about.
Joan Venge
+1  A: 

Is it feasible to implement one's own IEnumerable with possible MoveNext, MovePrev, etc and own implemented LINQ extensions that take advantages of these?

IEnumerable (or more properly, IEnumerator) doesn't have MovePrev. You could define an interface:

public interface IReversable<T> : IEnumerable<T>
{
    IEnumerator<T> GetReverseEnumerator();
}

This could be implemented by any container that supports efficient reverse enumeration.

You could then write an overload of Reverse (the extension method) to work off this new interface, and collection classes that implement the interface, etc. And then you'd have to use those collection classes instead of standard ones like List<T>.

But (I don't have Reflector handy to check) it may be that the built-in Reverse is smart enough to do things the quick way if it can get the IList interface from the collection, which would optimise the most common cases just fine anyway.

So there may not be a lot of point in this kind of approach.

Daniel Earwicker
Thanks Earwicker. This way the correct method would be called over my collection, right? i.e. it wouldn't call default Reverse on MyCollection.
Joan Venge
Please note I'm saying there's probably NOT a lot of point doing this. And if you ever want to be sure that a specific method implementation will be called, you can always give it a new, different name. This has the added advantage that people reading your code will know your intentions, as well as the compiler. But yes, you could do this. But download Reflector and look at what Reverse does for `IList` before going to any manual effort - I don't know if it handles it differently, but it might.
Daniel Earwicker
I didn't see anything with IList in the Reverse iterator.
Joan Venge