tags:

views:

1829

answers:

11

Is there a nice way to split a collection into 'n' parts with LINQ ? Not necessarily even of course

+7  A: 

EDIT: Okay, it looks like I misread the question. I read it as "pieces of length n" rather than "n pieces". Doh! Considering deleting answer...

(Original answer)

I don't believe there's a built-in way of partitioning, although I intend to write one in my set of additions to LINQ to Objects. Marc Gravell has an implementation here although I would probably modify it to return a read-only view:

public static IEnumerable<IEnumerable<T>> Partition<T>
    (this IEnumerable<T> source, int size)
{
    T[] array = null;
    int count = 0;
    foreach (T item in source)
    {
        if (array == null)
        {
            array = new T[size];
        }
        array[count] = item;
        count++;
        if (count == size)
        {
            yield return new ReadOnlyCollection<T>(array);
            array = null;
            count = 0;
        }
    }
    if (array != null)
    {             
        Array.Resize(ref array, count);
        yield return new ReadOnlyCollection<T>(array);
    }
}
Jon Skeet
Darn - beat me to it ;-p
Marc Gravell
You *really* don't like those "array[count++]", eh ;-p
Marc Gravell
Ironically I might well have used it before yesterday. But having written that answer it would have been hypocritical - and looking at both versions I think this *is* slightly easier to read at a glance. At first I used a list instead - Add is even more readable :)
Jon Skeet
Well, I just added a slightly different answer (to address "n pieces" rather than "pieces of length n"), and mixed in parts of your version ;-p
Marc Gravell
+2  A: 

Splitting a collection into n pieces is easy enough; there isn't an in-built LINQ extension method, but you can add your own. The only trick is - you need to know the length up front. So perhaps (by default) consider an extension method on something such as ICollection<T> (which has the Count). Then it is something like my previous answer:

using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;

static class Program { // formatted for vertical space
    static void Main() {
        var items = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

        int grp = 0;
        foreach (var group in items.Split(3)) {
            Console.WriteLine("Group " + grp++);
            foreach (var item in group) {
                Console.WriteLine(item);
            }
        }
    }
    // fallback option
    public static IEnumerable<IEnumerable<T>> Split<T>(
            this IEnumerable<T> items, int count) {
        return Split<T>(new List<T>(items), count);
    }
    // perferred option
    public static IEnumerable<IEnumerable<T>> Split<T>(
            this ICollection<T> items, int count) {
        if (count <= 0) throw new ArgumentOutOfRangeException("count");
        int size = items.Count / count;
        if ((items.Count % count) != 0) size++;
        T[] array = new T[size];
        int index = 0;
        foreach (T item in items) {
            array[index++] = item;
            if (index == size) {
                yield return new ReadOnlyCollection<T>(array);
                index = 0;
            }
        }
        if (array != null) {
            Array.Resize(ref array, index);
            yield return new ReadOnlyCollection<T>(array);
        }
    }
}
Marc Gravell
For anyone wondering why Marc didn't just use Count() - the above code makes sure that the source sequence is only evaluated once. (I assume this is the reason, anyway.) This can be very important. Of course, it requires more memory to buffer things...
Jon Skeet
yes - I was thinking about "read once" streams, such as iterator blocks, LINQ-to-SQL, etc.
Marc Gravell
That seems awefully complicated. If you really did have a read-once stream it is easier to just dump it into a list and then chop it up by index.
Jonathan Allen
And if the stream is infinite? Or just stupidly long? Often, LINQ-to-Objects is used for things like file parsing, network stream handling, etc - no end is in sight... so buffer-free is hugely desirable.
Marc Gravell
A: 

OK, so this is purely using LINQ, haven't fully tested it but it looks ok and it compiles. Similar to Marc Gravell's but implemented differently.

public static IEnumerable<IEnumerable<T>> Partition<T>
    (this IEnumerable<T> query, int partitions)
{
    // make a single query to get all records
    var all = query.ToList(); 

    // work out how many to take each time
    var take = items.Count / partitions;
    if ((items.Count % take) != 0) take++;

    for(var skip = 0; skip < all.Count; skip += take)
    {
        yield return all.Skip(skip).Take(take);
    }
}

Example usage:

var query = from entity in Context.Entities
            select entity;

return query.Partition(10);
Garry Shutler
Actually, that is more comparable to my Split version that Partition (which takes n items at a time). Very neat, but it reads (and skips) the early records lots of times... and you might have an off-by-one if it isn't cleanly divisible - i.e. Count is 10 and partitions is 3 - you'll return 4 blocks.
Marc Gravell
(I edited to add the missing "static" - needed for extension methods; note that it should probably take `IEnumerable<T>`, which would include `IQueryable<T>` by implication)
Marc Gravell
Need to double check the partitioning and have made it extend IEnumerable<T> instead. Is the skipping each time really that bad?
Garry Shutler
(removed my other comment, since I missed your ToList usage)
Marc Gravell
For small data, no it is fine; but for large volumes of data, it might be. Of course, for large volumes of data there are bigger issues (unrelated to this). ;-p
Marc Gravell
Since you are using a list, why not just use GetRange instead of Skip and Take.
Jonathan Allen
A: 

If order in these parts is not very important you can try this:

int[] array = new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
int n = 3;

var result =
   array.Select((value, index) => new { Value = value, Index = index }).GroupBy(i => i.Index % n, i => i.Value);

// or
var result2 =
   from i in array.Select((value, index) => new { Value = value, Index = index })
   group i.Value by i.Index % n into g
   select g;

However these can't be cast to IEnumerable<IEnumerable<int>> by some reason...

Dmitriy Matveev
+12  A: 

A pure linq and the simplest solution is as under.

static class LinqExtensions
{
    public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> list, int parts)
    {
        int i = 0;
        var splits = from name in list
                     group name by i++ % parts into part
                     select part.AsEnumerable();
        return splits;
    }
}
Hasan Khan
You can do: select part.AsEnumerable() instead of select (IEnumerable<T>)part. It feels more elegant.
tuinstoel
Thanks for the suggestion tuinstoel. It sure looks better now.
Hasan Khan
Doing all those modulus operations can get a bit expensive on long lists.
Jonathan Allen
It would be better to use the Select overload that includes the index.
Marc Gravell
A: 
int[] items = new int[] { 0,1,2,3,4,5,6,7,8,9, 10 };

int itemIndex = 0;
int groupSize = 2;
int nextGroup = groupSize;

var seqItems = from aItem in items
               group aItem by 
                            (itemIndex++ < nextGroup) 
                            ? 
                            nextGroup / groupSize
                            :
                            (nextGroup += groupSize) / groupSize
                            into itemGroup
               select itemGroup.AsEnumerable();
A: 

This is my code, nice and short.

 <Extension()> Public Function Chunk(Of T)(ByVal this As IList(Of T), ByVal size As Integer) As List(Of List(Of T))
     Dim result As New List(Of List(Of T))
     For i = 0 To CInt(Math.Ceiling(this.Count / size)) - 1
         result.Add(New List(Of T)(this.GetRange(i * size, Math.Min(size, this.Count - (i * size)))))
     Next
     Return result
 End Function
Jonathan Allen
+1  A: 

hi,

i use this:

    public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> instance, int partitionSize)
    {
        return instance
            .Select((value, index) => new { Index = index, Value = value })
            .GroupBy(i => i.Index / partitionSize)
            .Select(i => i.Select(i2 => i2.Value))
        ;
    }

greets! Elmer

Elmer
A: 

Edit - Nm doesn't work on infinite sequences

Arash
why would i want to split an infinite sequence in two?
Simon_Weaver
For divvying up the processing so that it can be performed in parallel. Useful in .NET 4.0.
Arash
+2  A: 

I have been using the Partition function I posted earlier quite often. The only bad thing about it was that is wasn't completely streaming. This is not a problem if you work with few elements in your sequence. I needed a new solution when i started working with 100.000+ elements in my sequence.

The following solution is a lot more complex (and more code!), but it is very efficient.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;

namespace LuvDaSun.Linq
{
    public static class EnumerableExtensions
    {
        public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> enumerable, int partitionSize)
        {
            /*
            return enumerable
                .Select((item, index) => new { Item = item, Index = index, })
                .GroupBy(item => item.Index / partitionSize)
                .Select(group => group.Select(item => item.Item)                )
                ;
            */

            return new PartitioningEnumerable<T>(enumerable, partitionSize);
        }

    }


    class PartitioningEnumerable<T> : IEnumerable<IEnumerable<T>>
    {
        IEnumerable<T> _enumerable;
        int _partitionSize;
        public PartitioningEnumerable(IEnumerable<T> enumerable, int partitionSize)
        {
            _enumerable = enumerable;
            _partitionSize = partitionSize;
        }

        public IEnumerator<IEnumerable<T>> GetEnumerator()
        {
            return new PartitioningEnumerator<T>(_enumerable.GetEnumerator(), _partitionSize);
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }


    class PartitioningEnumerator<T> : IEnumerator<IEnumerable<T>>
    {
        IEnumerator<T> _enumerator;
        int _partitionSize;
        public PartitioningEnumerator(IEnumerator<T> enumerator, int partitionSize)
        {
            _enumerator = enumerator;
            _partitionSize = partitionSize;
        }

        public void Dispose()
        {
            _enumerator.Dispose();
        }

        IEnumerable<T> _current;
        public IEnumerable<T> Current
        {
            get { return _current; }
        }
        object IEnumerator.Current
        {
            get { return _current; }
        }

        public void Reset()
        {
            _current = null;
            _enumerator.Reset();
        }

        public bool MoveNext()
        {
            bool result;

            if (_enumerator.MoveNext())
            {
                _current = new PartitionEnumerable<T>(_enumerator, _partitionSize);
                result = true;
            }
            else
            {
                _current = null;
                result = false;
            }

            return result;
        }

    }



    class PartitionEnumerable<T> : IEnumerable<T>
    {
        IEnumerator<T> _enumerator;
        int _partitionSize;
        public PartitionEnumerable(IEnumerator<T> enumerator, int partitionSize)
        {
            _enumerator = enumerator;
            _partitionSize = partitionSize;
        }

        public IEnumerator<T> GetEnumerator()
        {
            return new PartitionEnumerator<T>(_enumerator, _partitionSize);
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }


    class PartitionEnumerator<T> : IEnumerator<T>
    {
        IEnumerator<T> _enumerator;
        int _partitionSize;
        int _count;
        public PartitionEnumerator(IEnumerator<T> enumerator, int partitionSize)
        {
            _enumerator = enumerator;
            _partitionSize = partitionSize;
        }

        public void Dispose()
        {
        }

        public T Current
        {
            get { return _enumerator.Current; }
        }
        object IEnumerator.Current
        {
            get { return _enumerator.Current; }
        }
        public void Reset()
        {
            if (_count > 0) throw new InvalidOperationException();
        }

        public bool MoveNext()
        {
            bool result;

            if (_count < _partitionSize)
            {
                if (_count > 0)
                {
                    result = _enumerator.MoveNext();
                }
                else
                {
                    result = true;
                }
                _count++;
            }
            else
            {
                result = false;
            }

            return result;
        }

    }
}

Enjoy!

Elmer
very cool. thanks.
Simon_Weaver
+1  A: 

Ok, I'll throw my hat in the ring. The advantages of my algorithm:

  1. No expensive division or modulus operators
  2. All operations are O(1)
  3. Works for IEnumerable<> source (no Count property needed)
  4. Simple

The code:

public static IEnumerable<IEnumerable<T>>
  Section<T>(this IEnumerable<T> source, int length)
{
  if (length <= 0)
    throw new ArgumentOutOfRangeException("length");

  var section = new LinkedList<T>();

  foreach (var item in source)
  {
    section.AddLast(item);

    if (section.Count == length)
    {
      yield return section.AsReadOnly();
      section = new LinkedList<T>();
    }
  }

  if (section.Count > 0)
    yield return section.AsReadOnly();
}

static IEnumerable<T> AsReadOnly<T>(this IEnumerable<T> source)
{
  foreach (var item in source)
    yield return item;
}
Mike