tags:

views:

64

answers:

4

Hi there,

I have a collection of objects and need to take batches of 100 objects and do some work with them until there are no objects left to process.

Instead of looping through each item and grabbing 100 elements then the next hundred etc is there a nicer way of doing it with linq?

Many thanks

A: 
static void test(IEnumerable<object> objects)
{
    while (objects.Any())
    {
        foreach (object o in objects.Take(100))
        {
        }
        objects = objects.Skip(100); 
    }
}

:)

Andrey
+2  A: 
int batchSize = 100;
var batched = yourCollection.Select((x, i) => new { Val = x, Idx = i })
                            .GroupBy(x => x.Idx / batchSize,
                                     (k, g) => g.Select(x => x.Val));

// and then to demonstrate...
foreach (var batch in batched)
{
    Console.WriteLine("Processing batch...");

    foreach (var item in batch)
    {
        Console.WriteLine("Processing item: " + item);
    }
}
LukeH
it is very inefficient. GroupBy is expensive operation
Andrey
@Andrey: I suggest that you benchmark your own answer against the `GroupBy` version before you make any claims about inefficiency. You might be surprised by the results.
LukeH
@LukeH you are right, your method is more performant (more then 10x)!
Andrey
@Andrey: There are definitely more performant ways than `GroupBy` to do this. The obvious technique would be looping as the OP mentions in the question, or maybe something like Lee's or Foole's answers. I suggested `GroupBy` because it's succinct and doesn't need any extra helper code, not because it's the most efficient way to do this.
LukeH
@LukeH yes, there are more efficient code, but i was surprised to see that `GroupBy` is faster
Andrey
A: 

I don't think linq is really suitable for this sort of processing - it is mainly useful for performing operations on whole sequences rather than splitting or modifying them. I would do this by accessing the underlying IEnumerator<T> since any method using Take and Skip are going to be quite inefficient.

public static void Batch<T>(this IEnumerable<T> items, int batchSize, Action<IEnumerable<T>> batchAction)
{
    if (batchSize < 1) throw new ArgumentException();

    List<T> buffer = new List<T>();
    using (var enumerator = (items ?? Enumerable.Empty<T>()).GetEnumerator())
    {
        while (enumerator.MoveNext())
        {
            buffer.Add(enumerator.Current);
            if (buffer.Count == batchSize)
            {
                batchAction(buffer);
                buffer.Clear();
            }
        }

        //execute for remaining items
        if (buffer.Count > 0)
        {
            batchAction(buffer);
        }
    }
}
Lee
A: 

This will partition the list into a list of lists of however many items you specify.

public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> source, int size)
{
    int i = 0;
    List<T> list = new List<T>(size);
    foreach (T item in source)
    {
        list.Add(item);
        if (++i == size)
        {
            yield return list;
            list = new List<T>(size);
            i = 0;
        }
    }
    if (list.Count > 0)
        yield return list;
}
Foole