views:

534

answers:

4

Update: Here's a similar question


Suppose I have a DataTable with a few thousand DataRows in it.

I'd like to break up the table into chunks of smaller rows for processing.

I thought C#3's improved ability to work with data might help.

This is the skeleton I have so far:

DataTable Table = GetTonsOfData();

// Chunks should be any IEnumerable<Chunk> type
var Chunks = ChunkifyTableIntoSmallerChunksSomehow; // ** help here! **

foreach(var Chunk in Chunks)
{
   // Chunk should be any IEnumerable<DataRow> type
   ProcessChunk(Chunk);
}

Any suggestions on what should replace ChunkifyTableIntoSmallerChunksSomehow?

I'm really interested in how someone would do this with access C#3 tools. If attempting to apply these tools is inappropriate, please explain!


Update 3 (revised chunking as I really want tables, not ienumerables; going with an extension method--thanks Jacob):

Final implementation:

Extension method to handle the chunking:

public static class HarenExtensions
{
    public static IEnumerable<DataTable> Chunkify(this DataTable table, int chunkSize)
    {
        for (int i = 0; i < table.Rows.Count; i += chunkSize)
        {
            DataTable Chunk = table.Clone();

            foreach (DataRow Row in table.Select().Skip(i).Take(chunkSize))
            {
                Chunk.ImportRow(Row);
            }

            yield return Chunk;
        }
    }
}

Example consumer of that extension method, with sample output from an ad hoc test:

class Program
{
    static void Main(string[] args)
    {
        DataTable Table = GetTonsOfData();

        foreach (DataTable Chunk in Table.Chunkify(100))
        {
            Console.WriteLine("{0} - {1}", Chunk.Rows[0][0], Chunk.Rows[Chunk.Rows.Count - 1][0]);
        }

        Console.ReadLine();
    }

    static DataTable GetTonsOfData()
    {
        DataTable Table = new DataTable();
        Table.Columns.Add(new DataColumn());

        for (int i = 0; i < 1000; i++)
        {
            DataRow Row = Table.NewRow();
            Row[0] = i;

            Table.Rows.Add(Row);
        }

        return Table;
    }
}
+1  A: 

This seems like an ideal use-case for Linq's Skip and Take methods, depending on what you want to achieve with the chunking. This is completely untested, never entered in an IDE code, but your method might look something like this.

private List<List<DataRow>> ChunkifyTable(DataTable table, int chunkSize)
{
    List<List<DataRow>> chunks = new List<List<DaraRow>>();
    for (int i = 0; i < table.Rows.Count / chunkSize; i++)
    {
        chunks.Add(table.Rows.Skip(i * chunkSize).Take(chunkSize).ToList());
    }

    return chunks;
}
Jacob Proffitt
@Jacob: thanks for the suggestion. I made this return tables and implement IEnumerable with the yield keyword. I also turned it into an extension method as you suggested. Works great!
Michael Haren
Sweet! Glad to hear.
Jacob Proffitt
+3  A: 

This is quite readable and only iterates through the sequence once, perhaps saving you the rather bad performance characteristics of repeated redundant Skip() / Take() calls:

public IEnumerable<IEnumerable<DataRow>> Chunkify(DataTable table, int size)
{
    List<DataRow> chunk = new List<DataRow>(size);

    foreach (var row in table.Rows)
    {
        chunk.Add(row);
        if (chunk.Count == size)
        {
            yield return chunk;
            chunk = new List<DataRow>(size);
        }
    }

    if(chunk.Any()) yield return chunk;
}
mquander
This is certainly the most readable approach, but it forces the creation of an in-memory list for each chunk. That's reasonable for small chunk sizes but not for large ones. Also, I'd clear the chunk rather than creating a brand new one each time.I think the best solution would probably create an iterator which has a sub iterator, but that's certainly not going to be easy to read.
Damian Powell
A: 

Here's an approach that might work:

public static class Extensions
{
    public static IEnumerable<IEnumerable<T>> InPages<T>(this IEnumerable<T> enumOfT, int pageSize)
    {
        if (null == enumOfT) throw new ArgumentNullException("enumOfT");
        if (pageSize < 1) throw new ArgumentOutOfRangeException("pageSize");
        var enumerator = enumOfT.GetEnumerator();
        while (enumerator.MoveNext())
        {
            yield return InPagesInternal(enumerator, pageSize);
        }
    }
    private static IEnumerable<T> InPagesInternal<T>(IEnumerator<T> enumeratorOfT, int pageSize)
    {
        var count = 0;
        while (true)
        {
            yield return enumeratorOfT.Current;
            if (++count >= pageSize) yield break;
            if (false == enumeratorOfT.MoveNext()) yield break;
        }
    }
    public static string Join<T>(this IEnumerable<T> enumOfT, object separator)
    {
        var sb = new StringBuilder();
        if (enumOfT.Any())
        {
            sb.Append(enumOfT.First());
            foreach (var item in enumOfT.Skip(1))
            {
                sb.Append(separator).Append(item);
            }
        }
        return sb.ToString();
    }
}
[TestFixture]
public class Tests
{
    [Test]
    public void Test()
    {
        // Arrange
        var ints = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
        var expected = new[]
        {
            new[] { 1, 2, 3 },
            new[] { 4, 5, 6 },
            new[] { 7, 8, 9 },
            new[] { 10      },
        };

        // Act
        var pages = ints.InPages(3);

        // Assert
        var expectedString = (from x in expected select x.Join(",")).Join(" ; ");
        var pagesString = (from x in pages select x.Join(",")).Join(" ; ");

        Console.WriteLine("Expected : " + expectedString);
        Console.WriteLine("Pages    : " + pagesString);

        Assert.That(pagesString, Is.EqualTo(expectedString));
    }
}
Damian Powell
A: 

Jacob wrote

This seems like an ideal use-case for Linq's Skip and Take methods, depending on what you want to achieve with the chunking. This is completely untested, never entered in an IDE code, but your method might look something like this.

private List<List<DataRow>> ChunkifyTable(DataTable table, int chunkSize)
{
    List<List<DataRow>> chunks = new List<List<DaraRow>>();
    for (int i = 0; i < table.Rows.Count / chunkSize; i++)
    {
        chunks.Add(table.Rows.Skip(i * chunkSize).Take(chunkSize).ToList());
    }

    return chunks;
}

Thanks for this Jacob - useful for me but I think the test in your example should be <= not <. If you use < and the number of rows is less than chunkSize the loop is never entered. Similarly the last partial chunk is not captured, only full chunks. As you've stated, the example is untested, etc so this is just an FYI in case someone else uses your code verbatim ;-)

David Clarke