I have recently been in a situation where I needed to perform an operation a grouped slowly yielding Linq query.
Now, groupBy looses it's lazyness, that means that you have to wait for the entire Sequence to finish until you get any groups returned. This to me logically seems not the best solution, as a group can be returned as soon as it is first encountered.
I have written the following code, which seems to work fine, and am looking for pitfalls and general improvements, as well as thoughts on the concept itself (eg. can/should a groupBy method return groups as soon as possible).
public static IEnumerable<KeyValuePair<R, IEnumerable<T>>> GroupByLazy<T, R>(this IEnumerable<T> source, Func<T, R> keySelector)
{
var dic = new Dictionary<R, BlockingCollection<T>>();
foreach (var item in source)
{
var Key = keySelector(item);
BlockingCollection<T> i;
if (!dic.TryGetValue(Key, out i))
{
i = new BlockingCollection<T>();
i.Add(item);
dic.Add(Key, i);
yield return new KeyValuePair<R, IEnumerable<T>>(Key, i);
}
else i.TryAdd(item);
}
// mark all the groups as completed so that enumerations of group-items can finish
foreach (var groupedValues in dic.Values)
groupedValues.CompleteAdding();
}
Simple Test:
var slowIE = Observable.Interval(TimeSpan.FromSeconds(1)).ToEnumerable().Take(10);
var debug = slowIE.Do(i => Console.WriteLine("\teval " + i));
var gl = debug.GroupByLazy(i => i % 2 == 0);
var g = debug.GroupBy(i => i % 2 == 0);
Console.WriteLine("Lazy:");
gl.Run(i => Console.WriteLine("Group returned: " + i.Key));
Console.WriteLine(gl.Single(i => i.Key).Value.Count());
Console.WriteLine("NonLazy:");
g.Run(i => Console.WriteLine("Group returned: " + i.Key));
Console.WriteLine(g.Single(i => i.Key).Count());
Console.ReadLine();
which prints:
Lazy:
eval 0
Group returned: True
eval 1
Group returned: False
eval 2
eval 3
eval 4
eval 5
eval 6
eval 7
eval 8
eval 9
NonLazy:
eval 0
eval 1
eval 2
eval 3
eval 4
eval 5
eval 6
eval 7
eval 8
eval 9
Group returned: True
Group returned: False
As you can see, in my LazyGroupBy the groups are returned as soon as they are first encountered, and can thus be acted upon without waiting for the entire sequence to be grouped.
Thoughts?
Edit: quick thought, I think "Lazy" is not the right term...I'm not a native speaker, what term am I actually looking for?