ansaurus

Question

Consolidating and filtering iList items, looking for paterns, practices, or existing methods.

Answer 1

+1 A:

Yes, you want 3.5 because this gives you LINQ -- language-integrated query.

There is a slight performance cost, but for huge recordsets you can offset this by using PLINQ (parallel processing).

LINQ is a declarative, functional way to deal with sets.

Concepts you'll need:
- lambda expressions () =>
- extension methods

Consider from a set of 10,000 strings you want the first 100 that are longer than 24 characters:

var result = source.Where(s => s.Length > 24).Take(100);

From a set of Person objects you want to return the names, but they are divided into firstName and lastName properties.

var result = source.Select(person => person.firstName + person.LastName);

This returns IEnumerable<string>.

From the same set you want the average age:

var result = source.Average(person => person.Age);

Youngest 10 people:

var result = source.OrderBy(person => person.Age).Take(10);

Everybody, grouped by the first letter of their last names:

var result = source.GroupBy(person => person.lastName[0]);

This returns IGrouping<char, Person>

Names of the oldest 25 people whose last name starts with S:

var result = source.Where(person => person.lastName.StartsWith("S"))
   .OrderByDescending(person => person.Age)
   .Take(25)
   .Select(person => person.firstName + person.lastName);

Just imagine how much code you'd have to write in an foreach loop to accomplish this, and how much room there would be to introduce defects or missed optimizations among that code. The declarative nature of the LINQ syntax makes it easier to read and maintain.

There is an alternate syntax that is sort of SQL-ish, but shows how you are really defining queries against an arbitrary set of objects. Consider that you want to get people whose first name is "Bob":

var result = 
    from person in source
    where person.firstName == "Bob"
    select person;

It looks bizarre, but this is valid C# code if you jump up from 2.0.

My only warning is that once you work with LINQ you may refuse ever to work in 2.0 again.

There are lots of great resources available for learning LINQ syntax -- it doesn't take long.

update

Additional considerations in response to 1st comment:

You already have a very powerful tool at your disposal with C# 2.0 -- iterators.

Consider:

public class Foo
{
    private IEnumerable<Record> GetRecords()
    {
        Record record = // do I/O stuff, instantiate a record

        yield return record;
    }

    public void DisplayRecords()
    {
        foreach (Record record in GetRecords())
        {
            // do something meaningful

            // display the record
        }
    }
}

So, what is remarkable about this? The GetRecords() method is an iterator block, and the yield keyword returns results as requested ("lazy evaluation").

This means that when you call DisplayRecords(), it will call GetRecords(), but as soon as GetRecords() has a Record, it will return it to DisplayRecords(), which can then do something useful with it. When the foreach block loops again, execution will return to GetRecords(), which will return the next item, and so on.

In this way, you don't have to wait for 100,000 records to be read from disk before you can start sorting and displaying results.

This gives some interesting possibilities; whether or not this can be made useful in your situation is up to you (you wouldn't want to refresh the grid binding 100,000 times, for example).

Jay 2010-09-11 03:10:42

I had considered LINQ, but I'd have to load hundreds of thousands of records into the list (from hundreds of files) before being able to filter. Ideally I'd like to evaluate each item as I read it from disk. I also don't think that LINQ will allow me to consolidate like items, incrementing a Quantity field, but I could be wrong.

Robert Lee 2010-09-11 18:57:34

@Robert I added another thought on the matter. LINQ itself would not be able to consolidate like items, but you can use it to easily group those items, and then you could call a `Consolidate()` [extension?] method to combine them.

Jay 2010-09-12 04:14:20

There are LINQ-like libraries for 2.0 since all you need are iterators and anonymous delegates. All you lose by not moving to 3.5 is the sweet, sweet syntactic sugar.

Gabe 2010-09-12 04:58:50

Answer 2

A:

It sounds like you want to do something like this in pseudo-LINQ: data.GroupBy().Select(Count()).Where() -- you group (consolidate) by some criteria, count the number in each group, and then filter by the results.

However, you suggest that you may have too much data to load into memory all at once, so you want to consolidate as you load the data. This can be accomplished with your own GroupByCount operator, somewhat like this over-simplified version:

public static IEnumerable<KeyValuePair<T, int>>
    GroupByCount<T>(IEnumerable<T> input)
{
    Dictionary<T, int> counts = new Dictionary<T, int>();
    foreach (T item in input)
        if (counts.ContainsKey(item))
            counts[item]++;
        else
            counts[item] = 1;
    return counts;
}

Then you would just have data.GroupByCount().Where() and your data would all be consolidated as it loads because the foreach would only load the next item after processing the previous one.

Gabe 2010-09-12 05:18:12

This is conceptually similar to what I have implemented at this point, but since the user may want to group by multiple fields in the object I have to evaluate grouped fields and only consolidate if all of the fields match. If they do match, I also have to merge some fields ($amounts, etc) in the objects. Count is also a field, but is only incremented as items merge.

Robert Lee 2010-09-12 15:38:34

ansaurus

tags:

views:

answers:

Consolidating and filtering iList items, looking for paterns, practices, or existing methods.

update

related questions