Could anyone explain me why the generics list's Contains() function is so slow?
I have a List with about a million numbers, and the code that is constantly checking if there's a specific number within these numbers.
I tried doing the same thing using Dictionary and the ContainsKey() function, and it was about 10-20 times faster than with the List.
Of course, I don't really want to use Dictionary for that purpose, because it wasn't meant to be used that way.
So, the real question here is, is there any alternative to the List.Contains(), but not as whacky as Dictionary.ContainsKey() ?
Thanks in advance!
views:
2896answers:
6Dictionary isn't that bad, because the keys in a dictionary are designed to be found fast. To find a number in a list it needs to iterate through the whole list.
Of course the dictionary only works if your numbers are unique and not ordered.
I think there is also a HashSet<T>
class in .NET 3.5, it also allows only unique elements.
List.Contains is a O(n) operation.
Dictionary.ContainsKey is a O(1) operation, since it uses the hashcode of the objects as a key, which gives you a quicker search ability.
I don't think that it 's a good idea to have a List which contains a million entries. I don't think that the List class was designed for that purpose. :)
Isn't it possible to save those millon entities into a RDBMS for instance, and perform queries on that database ?
If it is not possible, then I would use a Dictionary anyway.
A SortedList will be faster to search (but slower to insert items)
Why is a dictionary inappropriate?
To see if a particular value is in the list you need to walk the entire list. With a dictionary (or other hash based container) it's much quicker to narrow down the number of objects you need to compare against. The key (in your case, the number) is hashed and that gives the dictionary the fractional subset of objects to compare against.
If you are just checking for existance, HashSet<T>
in .NET 3.5 is your best option - dictionary-like performance, but no key/value pair - just the values:
HashSet<int> data = new HashSet<int>();
for (int i = 0; i < 1000000; i++)
{
data.Add(rand.Next(50000000));
}
bool contains = data.Contains(1234567); // etc
I'm using this in the Compact Framework where there is no support for HashSet, I have opted for a Dictionary where both strings are the value I am looking for.
It means I get list<> functionality with dictionary performance. It's a bit hacky, but it works.