I've got a huge (>>10m) list of entries. Each entry offers two hash functions:
- Cheap: quickly computes hash, but its distribution is terrible (may put 99% of items in 1% of hash space)
- Expensive: takes a lot of time to compute, but the distribution is a lot better also
An ordinary Dictionary lets me use only one of these hash functions. I'd like a Dictionary that uses the cheap hash function first, and checks the expensive one on collisions.
It seems like a good idea to use a dictionary inside a dictionory for this. I currently basically use this monstrosity:
Dictionary<int, Dictionary<int, List<Foo>>>;
I improved this design so the expensive hash gets called only if there are actually two items of the same cheap hash.
It fits perfectly and does a flawless job for me, but it looks like something that should have died 65 million years ago.
To my knowledge, this functionality is not included in the basic framework. I am about to write a DoubleHashedDictionary class but I wanted to know of your opinion first.
As for my specific case:
First hash function = number of files in a file system directory (fast)
Second hash function = sum of size of files (slow)
Edits:
- Changed title and added more informations.
- Added quite important missing detail