views:

184

answers:

7

Given a generic List I would need some kind of index (in the database sense) that would allow me fast retrieval. The keys for this index would not be unique, so I can't use a dictionary. Here's what I have in mind: Given a class Foo { P1, P2, P3 } that may have data like this

{ "aaa", 111, "yes" }
{ "aaa", 112, "no" }
{ "bbb", 111, "no" }
{ "bbb", 220, "yes" }
{ "bbb", 220, "no" }
{ "ccc", 300, "yes" }

I would need to quickly access all the records where P1 is "bbb" (3rd,4th, and 5th) or all the ones where P2 is 111 (1st and 3rd). I could use a sorted List but if I need more than one way of sorting / indexing I would end up with duplicated lists.

Is there something built-in into the .NET framework or maybe an OS library that would do something like this? Thanks.

P.S. I mentioned "sorted List" with the idea that a sorted list will return / find an item much faster. I do not need the list to be necessarily sorted; I'm just looking for fast retrieval / finding.

+6  A: 
Patrick Karcher
I don't need them to be sorted per se, I just need fast access to these subsets.
pbz
How's that different from just looping through the list with a foreach? As far as I know that will end up being a loop in the end, i.e. no use of any index...
pbz
Your Dictionary<T, List<Foo>> is what I had in mind. In my particular case i4o turned out to be sufficient, but this may help someone else in the future. Thanks.
pbz
+1  A: 

One route would be to just use an embedded relational database a la SQLite (there's an ADO.NET binding here: http://sqlite.phxsoftware.com/)

Most data structures aren't going to meet your requirements unless you're willing to re-sort the list/whatever each time as you need a different ordering.

Joe
A: 

You might want to consider something like Lucene.Net, an indexing and search library. I don't know if this may be a more complex solution than you were looking for, but it would definitely meet your performance needs.

jamesaharvey
A: 

Why not use a HashSet to store the different instances of the Foo object (which will be unique) and then use a LINQ query to retrieve the ones that match the given criteria?

Something like:

var hash = new HashSet<Foo>
{
new Foo { P1 = "aaa", P2 = 111, P3 = "yes"},
new Foo { P1 = "aaa", P2 = 112, P3 = "no"},
new Foo { P1 = "bbb", P2 = 111, P3 = "no"},
new Foo { P1 = "bbb", P2 = 220, P3 = "yes"},
new Foo { P1 = "bbb", P2 = 220, P3 = "no"},
new Foo { P1 = "ccc", P2 = 300, P3 = "yes"},
};

var results = from match in hash
where match.P1 == "aaa"
select match;
Foovanadil
Forgot about the sorting need. You could add an order by clause to the LINQ query to handle sorting the resulting list (which is smarter then sorting the whole list first then filtering in most cases)
Foovanadil
How would it know that P1 is indexed? Wouldn't it be just as slow as a foreach? Thanks.
pbz
-1: This answer solves nothing, it's just like an array, unsorted at that, with extra overhead. Also note that he doesn't say he wants just one row for 111, he wants them all, fast. The above solution, given that none of the objects are actually duplicates, would store them all, and the Linq query would iterate over them all, as with a simple array. The real solution is to first figure out how far you need to go, and then if needed, implement an in-memory database-like structure with multiple indices.
Lasse V. Karlsen
+2  A: 

I've never actually had a chance to use it, but you may try i4o. Its supposed to provide indexes for in-memory objects for usage with Linq. You specify the indexes for a class using either attributes or as part of constructing the indexer, then you create an IndexableCollection.

At that point, you just query the collection using Linq, and the indexes work behind the scenes to optomize the access patterns for the data.

Chris
Sounds promising; I'll have a look at it...
pbz
pbz
+4  A: 

Don't ever forget this principle: Make it correct, make it clear, make it concise, make it fast. In that order. So, first code up the naive implementation:

static IEnumerable<T> GetByIndex<T>(
    List<T> list,
    Func<T, TIndex> func,
    TIndex key
) {
    return list.Where(x => func(x) == key);
}

Usage:

List<Test> tests = new List<Test>() {
            new Test { Name = "aaa", Value = 111, Valid = Valid.Yes },
            new Test { Name = "aaa", Value = 111, Valid = Valid.Yes },
            new Test { Name = "bbb", Value = 112, Valid = Valid.No },
            new Test { Name = "bbb", Value = 111, Valid = Valid.No },
            new Test { Name = "bbb", Value = 220, Valid = Valid.No },
            new Test { Name = "ccc", Value = 220, Valid = Valid.Yes }
};
IEnumerable<Test> lookup = GetByIndex(tests, x => x.Name, "bbb");

The above is correct, clear and concise. Almost surely it is fast enough for your purposes.

So, as far as making it fast you must first measure:

  1. Establish reasonable performance criterion.
  2. Establish a test-bed of real-world data.
  3. Profile the simple approach against the test-bed of real-world data. Note here that profiling includes deducing whether or not this functionality is a bottleneck in your application.

Then, if and only if this is not fast enough for you should you try to optimize. It wouldn't be too hard to implement an IndexedList<T> : ICollection<T> that would allow you to index off of various properties.

Here is a naive implementation that could get you started:

class IndexedList<T> : IEnumerable<T> {
    List<T> _list;
    Dictionary<string, Dictionary<object, List<T>>> _dictionary;
    Dictionary<string, Func<T, object>> _propertyDictionary;

    public IndexedList(IEnumerable<string> propertyNames) : this(propertyNames, new List<T>()) { }

    public IndexedList(IEnumerable<string> propertyNames, IEnumerable<T> source) {
        _list = new List<T>();
        _dictionary = new Dictionary<string, Dictionary<object, List<T>>>();
        _propertyDictionary = BuildPropertyDictionary(propertyNames);
        foreach (var item in source) {
            Add(item);
        }
    }

    static Dictionary<string, Func<T, object>> BuildPropertyDictionary(IEnumerable<string> keys) {
        var propertyDictionary = new Dictionary<string,Func<T,object>>();
        foreach (string key in keys) {
            ParameterExpression parameter = Expression.Parameter(typeof(T), "parameter");
            Expression property = Expression.Property(parameter, key);
            Expression converted = Expression.Convert(property, typeof(object));
            Func<T, object> func = Expression.Lambda<Func<T, object>>(converted, parameter).Compile();
            propertyDictionary.Add(key, func);
        }
        return propertyDictionary;
    }

    public void Add(T item) {
        _list.Add(item);
        foreach (var kvp in _propertyDictionary) {
            object key = kvp.Value(item);
            Dictionary<object, List<T>> propertyIndex;
            if (!_dictionary.TryGetValue(kvp.Key, out propertyIndex)) {
                propertyIndex = new Dictionary<object, List<T>>();
                _dictionary.Add(kvp.Key, propertyIndex);
            }
            List<T> list;
            if (!propertyIndex.TryGetValue(key, out list)) {
                list = new List<T>();
                propertyIndex.Add(key, list);
            }
            propertyIndex[key].Add(item);
        }
    }

    public IEnumerable<T> GetByIndex<TIndex>(string propertyName, TIndex index) {
        return _dictionary[propertyName][index];
    }

    public IEnumerator<T> GetEnumerator() {
        return _list.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator() {
        return GetEnumerator();
    }
}

Usage:

List<Test> tests = new List<Test>() {
            new Test { Name = "aaa", Value = 111, Valid = Valid.Yes },
            new Test { Name = "aaa", Value = 111, Valid = Valid.Yes },
            new Test { Name = "bbb", Value = 112, Valid = Valid.No },
            new Test { Name = "bbb", Value = 111, Valid = Valid.No },
            new Test { Name = "bbb", Value = 220, Valid = Valid.No },
            new Test { Name = "ccc", Value = 220, Valid = Valid.Yes }
};
// build an IndexedList<Text> indexed by Name and Value
IndexedList<Test> indexed = new IndexedList<Test>(new List<string>() { "Name", "Value" }, tests);
// lookup where Name == "bbb"
foreach (var result in indexed.GetByIndex("Name", "bbb")) {
    Console.WriteLine(result.Value);
}

But see, the reason you don't do this unless the naive implementation is not already fast enough is because of the additional complexity you just added to your system. You just added new code to maintain, new code to test and might not gain anything if this isn't faster on your real-world data or is not a bottleneck of your application.

Jason
A: 

I know you said you couldn't use a Dictionary, but would the following work?

For your example data set:

{ "aaa", 111, "yes" }
{ "aaa", 112, "no"  }
{ "bbb", 111, "no"  }
{ "bbb", 220, "yes" }
{ "bbb", 220, "no"  }
{ "ccc", 300, "yes" }

You could use the following:

var p1Lookup = new Dictionary<string,int []>();
p1Lookup.Add( "aaa", new int [] {0, 1} );
p1Lookup.Add( "bbb", new int [] {2, 3, 4} );
p1Lookup.Add( "ccc", new int [] {5} );

var p2Lookup = new Dictionary<int,int []>();
p1Lookup.Add( 111, new int [] {0, 2} );
p1Lookup.Add( 112, new int [] {1} );
p1Lookup.Add( 220, new int [] {3, 4} );
p1Lookup.Add( 300, new int [] {5} );

var p3Lookup = new Dictionary<int,int []>();
p1Lookup.Add( "yes", new int [] {0, 3, 5} );
p1Lookup.Add(  "no", new int [] {1, 2, 4} );

Depending on the usage, you could build the look-up dictionaries just once

Joseph Gordon