ansaurus

Question

Querying Complex Data Structure in Memory

Answer 1

+1 A:

Actually, for the above type of query, the dynamic LINQ stuff is quite a good fit. Otherwise you'll have to write pretty-much the same anyway - a parser, and a mechanism for mapping that to attributes. Unfortunately it isn't an exact hit, since you need to split things like OrderBy, and dates need to be parameterized - but here's a working example:

class Udr { // formatted for space
    public int BytesIn { get; set; }
    public UdrAnalysis Analysis { get; set; }
    public DateTime StartedDate { get; set; }
}
class UdrAnalysis {
    public int Discrepency { get; set; }
    public int DiscrepencyPercent { get; set; }
}    
static class Program {
    static void Main() {
        Udr[] data = new [] {
              new Udr { BytesIn = 50000, StartedDate = DateTime.Today,
                 Analysis = new UdrAnalysis { Discrepency = 50000, DiscrepencyPercent = 130}},
              new Udr { BytesIn = 500, StartedDate = DateTime.Today,
                 Analysis = new UdrAnalysis { Discrepency = 50000, DiscrepencyPercent = 130}}
        };
        DateTime when = DateTime.Parse("2008-11-10 22:00:44");
        var query = data.AsQueryable().Where(
            @"bytesin > 1000 && (analysis.discrepency > 100000
                || analysis.discrepencypercent > 100)
                && starteddate > @0",when)
            .OrderBy("analysis.discrepency DESC")
            .Take(10);
        foreach(var item in query) {
            Console.WriteLine(item.BytesIn);
        }
    }
}

Of course, you could take the dynamic LINQ sample and customize the parser to do more of what you need...

Marc Gravell 2008-11-27 06:57:15

Answer 2

+1 A:

Whether you use DLINQ or not, I suspect that you'll want to use LINQ somewhere in the solution, because it provides so many bits of what you want.

How much protection do you need from your users, and how technical are they? If this is only for a few very technical internal staff (e.g. who are already developers) then you could just let them write a C# expression and then use CSharpCodeProvider to compile the code - then apply it on your data.

Obviously this requires your users to be able to write C# - or at least just enough of it for a query expression - and it requires that you trust them not to trash the server. (You can load the code into a separate AppDomain, give it low privileges and tear down the AppDomain after a timeout, but that sort of thing is complicated to achieve - and you don't really want huge amounts of data crossing an AppDomain boundary.)

Jon Skeet 2008-11-27 07:14:39

Answer 3

A:

On the subject of LINQ in general - again, a good fit due to your size issues:

Just some notes on the scale, each billing system produces roughly 6 million records / day at a total file size of about 1 gig.

LINQ can be used fully with streaming solutions. For example, your "source" could be a file reader. The Where would then iterate over the data checking individual rows without having to buffer the entire thing in memory:

    static IEnumerable<Foo> ReadFoos(string path) {
        return from line in ReadLines(path)
               let parts = line.Split('|')
               select new Foo { Name = parts[0],
                   Size = int.Parse(parts[1]) };
    }
    static IEnumerable<string> ReadLines(string path) {
        using (var reader = File.OpenText(path)) {
            string line;
            while ((line = reader.ReadLine()) != null) {
                yield return line;
            }
        }
    }

This is now lazy loading... we only read one line at a time. You'll need to use AsQueryable() to use it with dynamic LINQ, but it stays lazy.

If you need to perform multiple aggregates over the same data, then Push LINQ is a good fit; this works particularly well if you need to group data, since it doesn't buffer everything.

Finally - if you want binary storage, serializers like protobuf-net can be used to create streaming solutions. At the moment it works best with the "push" approach of Push LINQ, but I expect I could invert it for regular IEnumerable<T> if needed.

Marc Gravell 2008-11-27 07:25:19

Have you integrated Push LINQ directly with protobuf-net then? My port has a MessageStreamIterator which implements IEnumerable<TMessage> for whatever message type you're interested in - it works fine for normal LINQ. Works with Push LINQ in the normal way too, of course.

Jon Skeet 2008-11-27 07:34:51

It isn't directly integrated, no; however, there is a sample on the repo of doing this. Basically, the deserialization supports sequences via IEnumerable<T> and Add(T) (as an alternative IList<T>, etc) - and it is trivial to create a Push LINQ feed where the Add pushes a value through Push LINQ.

Marc Gravell 2008-11-27 08:44:51

What I have yet to do is to make the deserialization produce an IEnumerable<T> directly...

Marc Gravell 2008-11-27 08:45:52

ansaurus

tags:

views:

answers:

Querying Complex Data Structure in Memory

related questions