ansaurus

Question

Answer 1

+1 A:

It's not totally clear to me exactly what you're trying to do, but how I would approach this problem would be to first write a PartitionLines function like this:

public static IEnumerable<IEnumerable<string>> PartitionLines(
    this IEnumerable<string> source,
    Func<string, string> groupMarkerSelector,
    string delimeter)
{
    List<string> currentGroup = new List<string>();

    foreach (string line in source)
    {
        var key = groupMarkerSelector(line);
        if (delimeter == key && currentGroup.Count > 0)
        {
            yield return currentGroup;
            currentGroup = new List<string>();
        }

        currentGroup.Add(line);
    }

    if (currentGroup.Count > 0)
        yield return currentGroup;
}

(Note that my function loads a "group" at time into memory; I assume this is OK.)

I'd then take something like this:

var line30Groups =
    TextFileLineEnumerator().
    PartitionLines(l => l.Substring(0, 2), "30");

Now you've got the lines in groups, with a new group of lines starting each time you see a "30." You could subdivide further:

var line3040Groups =
    TextFileLineEnumerator().
    PartitionLines(l => l.Substring(0, 2), "30").Select(g =>
        g.PartitionLines(l => l.Substring(0, 2), "40"));

Now you've got the lines in groups under the "30", and each group is an enumerable of groups under each child "40." And so on.

This is untested and could be cleaner, but you get the picture, I hope.

mquander 2010-09-03 18:23:48

I think you'll want to `yield return currentGroup.ToArray()` or something like that instead of `currentGroup` itself since otherwise the OP could end up calling `PartitionLines(s => s.Substring(0, 2), "30").ToList()` and getting a whole bunch of instances of the same `List<string>` object having a single set of elements.

Dan Tao 2010-09-03 18:44:02

Dan Tao, I agree, I just hastily screwed it up. I think the cleanest way is to `currentGroup = new List<string>()` instead of clearing it. I edited my post.

mquander 2010-09-03 19:00:17

Excellent solution mquander. I did run into the issue Dan mentioned, regarding the repeated List<string> instances, but ToList'ed the yielded currentGroup as per his recommendation, which fixed the issue. Thank you for your efforts!

Pierreten 2010-09-03 19:04:16

Just saw the edit, that works too and doesn't force an evaluation immeadiately which is more desirable

Pierreten 2010-09-03 19:05:09

ansaurus

tags:

views:

answers:

Parsing Text Data File With Linq

related questions