views:

442

answers:

3

Imagine you have a large dataset that may or may not be filtered by a particular condition of the dataset elements that can be intensive to calculate. In the case where it is not filtered, the elements are grouped by the value of that condition - the condition is calculated once.

However, in the case where the filtering has taken place, although the subsequent code still expects to see an IEnumerable<IGrouping<TKey, TElement>> collection, it doesn't make sense to perform a GroupBy operation that would result in the condition being re-evaluated a second time for each element. Instead, I would like to be able to create an IEnumerable<IGrouping<TKey, TElement>> by wrapping the filtered results appropriately, and thus avoiding yet another evaluation of the condition.

Other than implementing my own class that provides the IGrouping interface, is there any other way I can implement this optimization? Are there existing LINQ methods to support this that would give me the IEnumerable<IGrouping<TKey, TElement>> result? Is there another way that I haven't considered?

A: 

What about putting the result into a LookUp and using this for the rest of the time?

var lookup = data.ToLookUp(i => Foo(i));
Daniel Brückner
A lookup doesn't implement IEnumerable<IGrouping<TKey, TElement>> unfortunately. I suppose I could put both the filtered and non-filtered groups in a lookup, but I was hoping to avoid extra processing on the filtered list as well as avoiding any changes to the subsequent code. I'll look into it and post back.
Jeff Yates
+2  A: 

the condition is calculated once

I hope those keys are still around somewhere...

If your data was in some structure like this:

public class CustomGroup<T, U>
{
  T Key {get;set;}
  IEnumerable<U> GroupMembers {get;set} 
}

You could project such items with a query like this:

var result = customGroups
  .SelectMany(cg => cg.GroupMembers, (cg, z) => new {Key = cg.Key, Value = z})
  .GroupBy(x => x.Key, x => x.Value)
David B
I cannot guarantee the keys remain - it's a complex query using structures that I have not written and therefore cannot rely on for any type of caching. However, you have inspired a solution I think may work - it seems so obvious now.
Jeff Yates
+1 for the inspiration. Thanks!
Jeff Yates
A: 

Inspired by David B's answer, I have come up with a simple solution. So simple that I have no idea how I missed it.

In order to perform the filtering, I obviously need to know what value of the condition I am filtering by. Therefore, given a condition, c, I can just project the filtered list as:

filteredList.GroupBy(x => c)

This avoids any recalculation of properties on the elements (represented by x).

Another solution I realized would work is to revers the ordering of my query and perform the grouping before I perform the filtering. This too would mean the conditions only get evaluated once, although it would unnecessarily allocate groupings that I wouldn't subsequently use.

Jeff Yates
I should add that this does mean one loop through all the items to group them, which I would still like to avoid if I can. I may still create my own grouping class so that I can avoid this.
Jeff Yates