views:

146

answers:

6

How would one take a List (using LINQ) and break it into a List of Lists partitioning the original list on every 8th entry?

I imagine something like this would involve Skip and/or Take, but I'm still pretty new to LINQ.

Edit: Using C# / .Net 3.5

+6  A: 

We have just such a method in MoreLINQ as the Batch method:

// As IEnumerable<IEnumerable<T>>
var items = list.Batch(8);

or

// As IEnumerable<List<T>>
var items = list.Batch(8, seq => seq.ToList());
Jon Skeet
Cool, this is implemented very nicely, including a resultSelector (important for manipulating/ordering the inner list).
Kirk Woll
Phew. I thought perhaps I was a bit daft in not being able to figure this out. Good to see that there are some things that are "missing" from regular LINQ-to-Objects. :)
Pretzel
@Pretzel: It's not that this is impossible using plain old LINQ ... it's just that it's neither terribly efficient or easy to understand. See my answer for a "plain LINQ" example.
LBushkin
+1, Thanks for the link to this library. I'll see if I can use it in future projects.
Pretzel
+3  A: 

Use the following extension method to break the input into subsets

public static class IEnumerableExtensions
{
    public static IEnumerable<List<T>> InSetsOf<T>(this IEnumerable<T> source, int max)
    {
        List<T> toReturn = new List<T>(max);
        foreach(var item in source)
        {
                toReturn.Add(item);
                if (toReturn.Count == max)
                {
                        yield return toReturn;
                        toReturn = new List<T>(max);
                }
        }
        if (toReturn.Any())
        {
                yield return toReturn;
        }
    }
}
Handcraftsman
I'm going to try this now as this seems quite clever... The thought of "yield return" popped into my head while mulling this over, but I couldn't see a clear way to do it... I'll let you know how this works for me.
Pretzel
Wow! That's really frickin' cool. I'm going with this! Thanks for the help! :-)
Pretzel
+2  A: 

You're better off using a library like MoreLinq, but if you really had to do this using "plain LINQ", you can use GroupBy:

var sequence = new[] {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};

var result = sequence.Select((x, i) => new {Group = i/8, Value = x})
                     .GroupBy(item => item.Group, g => g.Value)
                     .Select(g => g.Where(x => true));

// result is: { {1,2,3,4,5,6,7,8}, {9,10,11,12,13,14,15,16} }

Basically, we use the version of Select() that provides an index for the value being consumed, we divide the index by 8 to identify which group each value belongs to. Then we group the sequence by this grouping key. The last Select just reduces the IGrouping<> down to an IEnumerable<IEnumerable<T>> (and isn't strictly necessary since IGrouping is an IEnumerable).

It's easy enough to turn this into a reusable method by factoring our the constant 8 in the example, and replacing it with a specified parameter. It's not necessarily the most elegant solution, and it is not longer a lazy, streaming solution ... but it does work.

You could also write your own extension method using iterator blocks (yield return) which could give you better performance and use less memory than GroupBy. This is what the Batch() method of MoreLinq does IIRC.

LBushkin
Thanks for your input. Yeah, it doesn't seem efficient and as you can imagine, I was struggling to understand how I could do it with regular LINQ. (I'm staring at your answer right now and I really don't understand it very well.) I'll have to fiddle with it more later. (Thanks again!)
Pretzel
The approach using `GroupBy()` breaks down if the sequence you're planning on batching is going to be extremely large (or infinite). As far as how it works - it creates an anonymous object which associates each item with it's index, and then groups this into a series of sequences based on divisibility by `8` (or any other non-zero constant).
LBushkin
A: 

Take won't be very efficient, because it doesn't remove the entries taken.

why not use a simple loop:

public IEnumerable<IList<T>> Partition<T>(this/* <-- see extension methods*/ IEnumerable<T> src,int num)  
{  
    IEnumerator<T> enu=src.getEnumerator();  
    while(true)  
    {  
        List<T> result=new List<T>(num);  
        for(int i=0;i<num;i++)  
        {  
            if(!enu.MoveNext())  
            {  
                if(i>0)yield return result;  
                yield break;  
            }  
            result.Add(enu.Current);  
        }  
        yield return result;  
    }  
}
Floste
A: 

It's not at all what the original Linq designers had in mind, but check out this misuse of GroupBy:

public static IEnumerable<IEnumerable<T>> BatchBy<T>(this IEnumerable<T> items, int batchSize)
{
    var count = 0;
    return items.GroupBy(x => (count++ / batchSize)).ToList();
}

[TestMethod]
public void BatchBy_breaks_a_list_into_chunks()
{
    var values = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
    var batches = values.BatchBy(3);
    batches.Count().ShouldEqual(4);
    batches.First().Count().ShouldEqual(3);
    batches.Last().Count().ShouldEqual(1);
}

I think it wins the "golf" prize for this question. The ToList is very important since you want to make sure the grouping has actually been performed before you try doing anything with the output. If you remove the ToList(), you will get some weird side effects.

Mel
For the record, Handcraftsman's "yield return"-based version performs much better, but I still like the "Hey, you're not supposed to be doing that" aspect of this code.
Mel
A: 
from b in Enumerable.Range(0,8) select items.Where((x,i) => (i % 8) == b);
James Dunne