tags:

views:

30

answers:

1

I have an IEnumerable of items that I would like to group by associated categories. The items are grouped by the categories that are associated with them - which is a List - so a single item can potentially be a part of multiple categories.

var categories = numbers.SelectMany(x => x.Categories).Distinct();
var query = 
      from cat in categories
      select new {Key = cat, 
                  Values = numbers.Where(n => n.Categories.Contains(cat))};

I use the above code, and it does in fact work, but I was wondering if there was a more efficient way of doing this because this operation will likely perform slowly when numbers contains thousands of values.

I am pretty much asking for a refactoring of the code to be more efficient.

+1  A: 

You can use LINQ's built-in grouping capabilities, which should be faster than a contains lookup. However, as with any performance-related question, you should really write code to collect performance metrics before deciding how to rewrite code that you know works. It may turn out that there's no performance problem at all for the volumes you will be working with.

So, here's the code. This isn't tested, but something like it should work:

var result = from n in numbers
             from c in n.Categories
             select new {Key = c, n.Value}
             into x group x by x.Key into g
             select g;

Each group contains a key and a sequence of values that belong to that key:

foreach( var group in result )
{
    Console.WriteLine( group.Key );
    foreach( var value in group )
        Console.WriteLine( value );
}
LBushkin