ansaurus

Question

Answer 1

+6 A:

Patrick Karcher 2010-01-27 00:38:26

I don't need them to be sorted per se, I just need fast access to these subsets.

pbz 2010-01-27 00:43:00

How's that different from just looping through the list with a foreach? As far as I know that will end up being a loop in the end, i.e. no use of any index...

pbz 2010-01-27 01:00:29

Your Dictionary<T, List<Foo>> is what I had in mind. In my particular case i4o turned out to be sufficient, but this may help someone else in the future. Thanks.

pbz 2010-01-28 00:25:40

Answer 2

+1 A:

One route would be to just use an embedded relational database a la SQLite (there's an ADO.NET binding here: http://sqlite.phxsoftware.com/)

Most data structures aren't going to meet your requirements unless you're willing to re-sort the list/whatever each time as you need a different ordering.

Joe 2010-01-27 00:39:20

Answer 3

A:

You might want to consider something like Lucene.Net, an indexing and search library. I don't know if this may be a more complex solution than you were looking for, but it would definitely meet your performance needs.

jamesaharvey 2010-01-27 00:41:06

Answer 4

A:

Why not use a HashSet to store the different instances of the Foo object (which will be unique) and then use a LINQ query to retrieve the ones that match the given criteria?

Something like:

var hash = new HashSet<Foo>
{
new Foo { P1 = "aaa", P2 = 111, P3 = "yes"},
new Foo { P1 = "aaa", P2 = 112, P3 = "no"},
new Foo { P1 = "bbb", P2 = 111, P3 = "no"},
new Foo { P1 = "bbb", P2 = 220, P3 = "yes"},
new Foo { P1 = "bbb", P2 = 220, P3 = "no"},
new Foo { P1 = "ccc", P2 = 300, P3 = "yes"},
};

var results = from match in hash
where match.P1 == "aaa"
select match;

Foovanadil 2010-01-27 00:50:31

Forgot about the sorting need. You could add an order by clause to the LINQ query to handle sorting the resulting list (which is smarter then sorting the whole list first then filtering in most cases)

Foovanadil 2010-01-27 00:51:42

How would it know that P1 is indexed? Wouldn't it be just as slow as a foreach? Thanks.

pbz 2010-01-27 01:06:11

-1: This answer solves nothing, it's just like an array, unsorted at that, with extra overhead. Also note that he doesn't say he wants just one row for 111, he wants them all, fast. The above solution, given that none of the objects are actually duplicates, would store them all, and the Linq query would iterate over them all, as with a simple array. The real solution is to first figure out how far you need to go, and then if needed, implement an in-memory database-like structure with multiple indices.

Lasse V. Karlsen 2010-01-27 14:58:24

Answer 5

+2 A:

I've never actually had a chance to use it, but you may try i4o. Its supposed to provide indexes for in-memory objects for usage with Linq. You specify the indexes for a class using either attributes or as part of constructing the indexer, then you create an IndexableCollection.

At that point, you just query the collection using Linq, and the indexes work behind the scenes to optomize the access patterns for the data.

Chris 2010-01-27 01:11:03

Sounds promising; I'll have a look at it...

pbz 2010-01-27 01:24:46

pbz 2010-01-28 00:24:20

Answer 6

+4 A:

Don't ever forget this principle: Make it correct, make it clear, make it concise, make it fast. In that order. So, first code up the naive implementation:

static IEnumerable<T> GetByIndex<T>(
    List<T> list,
    Func<T, TIndex> func,
    TIndex key
) {
    return list.Where(x => func(x) == key);
}

Usage:

List<Test> tests = new List<Test>() {
            new Test { Name = "aaa", Value = 111, Valid = Valid.Yes },
            new Test { Name = "aaa", Value = 111, Valid = Valid.Yes },
            new Test { Name = "bbb", Value = 112, Valid = Valid.No },
            new Test { Name = "bbb", Value = 111, Valid = Valid.No },
            new Test { Name = "bbb", Value = 220, Valid = Valid.No },
            new Test { Name = "ccc", Value = 220, Valid = Valid.Yes }
};
IEnumerable<Test> lookup = GetByIndex(tests, x => x.Name, "bbb");

The above is correct, clear and concise. Almost surely it is fast enough for your purposes.

So, as far as making it fast you must first measure:

Establish reasonable performance criterion.
Establish a test-bed of real-world data.
Profile the simple approach against the test-bed of real-world data. Note here that profiling includes deducing whether or not this functionality is a bottleneck in your application.

Then, if and only if this is not fast enough for you should you try to optimize. It wouldn't be too hard to implement an IndexedList<T> : ICollection<T> that would allow you to index off of various properties.

Here is a naive implementation that could get you started:

class IndexedList<T> : IEnumerable<T> {
    List<T> _list;
    Dictionary<string, Dictionary<object, List<T>>> _dictionary;
    Dictionary<string, Func<T, object>> _propertyDictionary;

    public IndexedList(IEnumerable<string> propertyNames) : this(propertyNames, new List<T>()) { }

    public IndexedList(IEnumerable<string> propertyNames, IEnumerable<T> source) {
        _list = new List<T>();
        _dictionary = new Dictionary<string, Dictionary<object, List<T>>>();
        _propertyDictionary = BuildPropertyDictionary(propertyNames);
        foreach (var item in source) {
            Add(item);
        }
    }

    static Dictionary<string, Func<T, object>> BuildPropertyDictionary(IEnumerable<string> keys) {
        var propertyDictionary = new Dictionary<string,Func<T,object>>();
        foreach (string key in keys) {
            ParameterExpression parameter = Expression.Parameter(typeof(T), "parameter");
            Expression property = Expression.Property(parameter, key);
            Expression converted = Expression.Convert(property, typeof(object));
            Func<T, object> func = Expression.Lambda<Func<T, object>>(converted, parameter).Compile();
            propertyDictionary.Add(key, func);
        }
        return propertyDictionary;
    }

    public void Add(T item) {
        _list.Add(item);
        foreach (var kvp in _propertyDictionary) {
            object key = kvp.Value(item);
            Dictionary<object, List<T>> propertyIndex;
            if (!_dictionary.TryGetValue(kvp.Key, out propertyIndex)) {
                propertyIndex = new Dictionary<object, List<T>>();
                _dictionary.Add(kvp.Key, propertyIndex);
            }
            List<T> list;
            if (!propertyIndex.TryGetValue(key, out list)) {
                list = new List<T>();
                propertyIndex.Add(key, list);
            }
            propertyIndex[key].Add(item);
        }
    }

    public IEnumerable<T> GetByIndex<TIndex>(string propertyName, TIndex index) {
        return _dictionary[propertyName][index];
    }

    public IEnumerator<T> GetEnumerator() {
        return _list.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator() {
        return GetEnumerator();
    }
}

Usage:

List<Test> tests = new List<Test>() {
            new Test { Name = "aaa", Value = 111, Valid = Valid.Yes },
            new Test { Name = "aaa", Value = 111, Valid = Valid.Yes },
            new Test { Name = "bbb", Value = 112, Valid = Valid.No },
            new Test { Name = "bbb", Value = 111, Valid = Valid.No },
            new Test { Name = "bbb", Value = 220, Valid = Valid.No },
            new Test { Name = "ccc", Value = 220, Valid = Valid.Yes }
};
// build an IndexedList<Text> indexed by Name and Value
IndexedList<Test> indexed = new IndexedList<Test>(new List<string>() { "Name", "Value" }, tests);
// lookup where Name == "bbb"
foreach (var result in indexed.GetByIndex("Name", "bbb")) {
    Console.WriteLine(result.Value);
}

But see, the reason you don't do this unless the naive implementation is not already fast enough is because of the additional complexity you just added to your system. You just added new code to maintain, new code to test and might not gain anything if this isn't faster on your real-world data or is not a bottleneck of your application.

Jason 2010-01-27 01:34:16

Answer 7

A:

I know you said you couldn't use a Dictionary, but would the following work?

For your example data set:

{ "aaa", 111, "yes" }
{ "aaa", 112, "no"  }
{ "bbb", 111, "no"  }
{ "bbb", 220, "yes" }
{ "bbb", 220, "no"  }
{ "ccc", 300, "yes" }

You could use the following:

var p1Lookup = new Dictionary<string,int []>();
p1Lookup.Add( "aaa", new int [] {0, 1} );
p1Lookup.Add( "bbb", new int [] {2, 3, 4} );
p1Lookup.Add( "ccc", new int [] {5} );

var p2Lookup = new Dictionary<int,int []>();
p1Lookup.Add( 111, new int [] {0, 2} );
p1Lookup.Add( 112, new int [] {1} );
p1Lookup.Add( 220, new int [] {3, 4} );
p1Lookup.Add( 300, new int [] {5} );

var p3Lookup = new Dictionary<int,int []>();
p1Lookup.Add( "yes", new int [] {0, 3, 5} );
p1Lookup.Add(  "no", new int [] {1, 2, 4} );

Depending on the usage, you could build the look-up dictionaries just once

Joseph Gordon 2010-01-27 04:29:37

ansaurus

tags:

views:

answers:

List with multiple indexes

related questions