views:

418

answers:

1

I have a simple text file containing some CSV with the following structure:

@Parent1_Field1, Parent1_Field2, Parent1_Field3
Child1_Field1, Child1_Field2
Child2_Field1, Child2_Field2
...etc.
@Parent2_Field1, Parent2_Field2, Parent2_Field3
Child1_Field1, Child1_Field2
Child2_Field1, Child2_Field2
...etc.

'@' indicates a parent object of child objects that are immediately below it. (This could be represented better using XML, but that's not an option in my case.)

My purpose is to use LINQ to query this file without loading its entire content into memory. First, I created a class (here: MyCustomReader) that implements IEnumerable in which I use StreamReader to get each line of the file.

E.g. the following gets all Parent objects (without the children):

from line in MyCustomReader
where line.StartsWith("@")
select Parent.Create(line)

However, I got stuck when I wanted to create queries that involve both Parent and Child objects. For instance, getting all the children for a particular parent object or getting all the Parent objects where a particular childfield contains the same value.

E.g. this gets all the children for a particular Parent object:

public IEnumerable<Child> GetChildrenForAParent(string uniqueParentName)
{
    Parent parent = null;
    foreach (string line in MyCustomReader)
    {
        if (line.StartsWith("@"))
            parent = Parent.Create(line);
        else if (parent.UniqueName == uniqueParentName)
            yield return Child.Create(line);
    }
}

and the second example:

public IEnumerable<Parent> GetParentsWhereChildHasThisValue(string childFiledValue)
{
    Parent parent = null;
    foreach (string line in MyCustomReader)
    {
        if (line.StartsWith("@"))
        {
            parent = Line.Create(line);
        }
        else //child
        {
            Child child = Child.Create(line);
            if (child.FiledValue == childFiledValue)
                yield return parent;
        }
    }
}

How could these two examples be achieved using LINQ?

+4  A: 

This isn't pretty but for the first one something like the following should work:

MyCustomReader.SkipWhile(line => line != uniqueParentName).Skip(1).
                                     TakeWhile(line => !line.StartsWith("@"));

EDIT: Ok so I'm bored. I think this will do the second one for you (however obviously its not a problem that is suited to LINQ):

var res = MyCustomReader.Where(parentLine => parentLine.StartsWith("@"))
         .Join(MyCustomReader.Where(childLine => !childLine.StartsWith("@")),
              parentLine => parentLine,
              childLine => MyCustomReader.Reverse<string>()
                   .SkipWhile(z => z != childLine)
                   .SkipWhile(x => !x.StartsWith("@")).First(),
              (x, y) => new { Parent = x, Child = y })
         .Where(a => a.Child == childFiledValue).Select(a => a.Parent);
Simon Fox
the will get the lines, you might want to add .Select(line => Child.Create(line)) to the end
Simon Fox
It works fine. Thanks
You may need to find a ninja for the second one :)
Simon Fox
actually the SkipWhile on the Reverse won't handle duplicate lines...
Simon Fox