The problem with groups and orders is that they require knowledge of the entire collection in order to perform the operations. Same thing with aggregates like min, max, sum, avg, etc. Some of these operations cannot interrogate the true type of the IEnumerable being passed, or it wouldn't matter, since they are "destructive" in nature so they have to create a working copy. When you chain these things together, you end up with at least two copies of the full enumerable; the one produced by the previous method that is being iterated by the current method, and the one being generated by the current method. Any copy of the enumerable that has a reference outside the query (like the source enumerable) also stays in memory, and enumerables that have become orphaned stick around until the GC thread has time to dispose of and finalize them. For a large source enumerable, all of this can create a massive demand on the heap.
In addition, nesting aggregates in clauses can very quickly make the query expensive. Unlike a DBMS which can engineer a "query plan", Linq isn't quite that smart. Min(), for example, requires iteration of the whole enumerable to find the smallest value of the specified projection. When that's a criteria of a Where clause, a good DBMS will find that value once per context and then inline the value in subsequent evaluations as necessary. Linq just runs the extension method each time it is called, and when you have a condition like enumerable.Where(x=>x.Value == enumerable.Min(x2=>x2.Value)), that's an O(N^2)-complexity operation just to evaluate the filter. Add multiple levels of grouping and the Big-O can easily reach high-polynomial complexity.
Generally, you can reduce query time by performing the optimizations that a DBMS would given the same query. If the value of an aggregate can be known for the full scope of the query (e.g. result = source.Where(s=>s.Value == source.Min(x=>x.value))
), evaluate that into a variable using the let
clause (or an external query), and replace Min() calls with the alias. Iterating the enumerable twice is generally cheaper than iterating it N^2 times, especially if the enumerable stays in memory between iterations.
Also, make sure your query order reduces the sample space as much and as cheaply as possible before you start grouping. You may be able to make educated guesses about conditions that must be evaluated expensively, such as Where(s=>s.Value < threshold).Where(s=>s.Value == source.Min(x=>x.Value)) or more concisely, Where(s=>s.Value < threshold && s.Value == source.Min(x=>x.Value)) (the second works in C# because of lazy condition evaluation, but not all languages evaluate lazily). That cuts the number of evaluations of Min() down to the number of elements meeting the first criteria. You can use existing criteria to do the same thing, wherever the criteria A and B are independent enough that A && B == B && A.