views:

58

answers:

2

I have a huge Collection (which I can cast as an enumerable using OfType<>()) of objects. Each of these objects has a Category property, which is drawn from a list somewhere else in the application. This Collection can reach sizes of hundreds of items, but it is possible that only, say, 6/30 of the possible Categories are actually used. What is the fastest method to find these 6 Categories? The size of the huge Collection discourages me from just iterating across the entire thing and returning all unique values, so is there a faster method of accomplishing this?

Ideally I'd collect the categories into a List<string>.

+2  A: 

If you are using .NET 3.5 then try this:

List<string> categories = collection
    .Cast<Foo>()
    .Select(foo => foo.Category)
    .Distinct()
    .ToList();

It should be very fast.

I assume these objects originally came from a database? If so then you might want to ask the database to do the work for you. If there is an index on that column then you will get the result close to instantly without even having to fetch the objects into memory.

Mark Byers
Both the collection of files and the list of categories are both stored in a database, yes. The Category column for the files, however, is not indexed, and I am not permitted to change that just yet. This LINQ looks like it should do the trick, most likely.
ccomet
A: 

The size of the huge Collection discourages me from just iterating across the entire thing and returning all unique values

I am afraid in order to find all used categories, you will have to look at each item once, so you can hardly avoid iterating (unless you keep track of the used categories while building your collection).

Try if Mark Byers solution is fast enough for you and only worry about its performance if it isn't.

Jens