views:

95

answers:

2

I was wondering how could I improve the performance of the following code:

public class MyObject
{
    public int Year { get; set; }
}

//In my case I have 30000
IEnumerable<MyObject> data = MethodThatReturnsManyMyObjects(); 

var groupedByYear = data.GroupBy(x => x.Year); 

//Here is the where it takes around 5 seconds
foreach (var group in groupedByYear) 
    //do something here.

The idea is to get a set of objects with unique year values. In my scenario there are only 6 years included in the 30000 items in the list so the foreach loop will be executed 6 times only. So we have many items needing to be grouped in a few groups.

Using the .Distinct() with an explicit IEqualityComparer would be an alternative but somehow I feel that it wont make any difference.

I can understand if 30000 items is too much and that i should be happy with the 5 seconds I get, but I was wondering if the above can be imporved performance wise.

Thanks.

EDIT: The answers below made me dig a bit deeper only to realize that the 5 seconds I am getting only heppens when the data are loaded into memory from the DB. The delay was disguised inside the foreach loop as the deferred execution of the IEnumerable delayed it until that point confusing me to assume that probably the GroupBy() could be refactored to something more performant.

The question still stands though, is the GroupBy() command the optimal way to achieve the best performance in such cases?

+2  A: 

That definitely shouldn't take that long. Is this running under the debugger, or not? Are any exceptions being thrown? Does the Year property perform any calculations in real life? It should execute this almost instantly, to be honest.

Do you have a short but complete program which demonstrates it taking a long time? (If not, I'll try to come up with one myself to get some sample timings.)

Note that if MethodThatReturnsManyMyObjects is using deferred execution for the iterator, that could be the culprit - how long does it take if you call data.ToList() for example?

Jon Skeet
@Jon Thanks for the insight. I am currently trying to get better benchamrk times to provide. I'll have something to show in a bit.
tolism7
I suspect that deferred evaluation is the issue. In a simple example I made based on the criteria provided, it is almost instant with 350,000 samples.
Ian P
tolism7
+1  A: 

I'm curious to know: does your MethodThatReturnsManyMyObjects provide lazy evaluation (i.e., using the yield keyword)? If so, that could be your culprit, rather than the call to GroupBy:

// if MethodThatReturnsManyMyObjects uses yield, then
// it won't be executed until enumeration
IEnumerable<MyObject> data = MethodThatReturnsManyMyObjects();

// still not executed
var groupedByYear = data.GroupBy(x => x.Year); 

// finally executed here
foreach (var group in groupedByYear)
    // ...
Dan Tao