Which do you prefer and why?
They both can be used to accomplish similar tasks but I'm curious as to see what people have used in actual applications and their reasoning for doing so.
Which do you prefer and why?
They both can be used to accomplish similar tasks but I'm curious as to see what people have used in actual applications and their reasoning for doing so.
I prefer cuckoo hashing.
I am wary of the false positives that may show up with bloom filters at higher fill factors.
Have used cuckoo hashing in an application where we had very large hash tables and were running into memory pressure issues.
Please see my eCollections library at http://codeplex.com/ecollections for the implementation of a variant of cuckoo hashing.
Kind regards,
If I can tolerate the false positives and space is critical, I use a Bloom filter because it takes less space. Otherwise, I use a hash.
Which do you prefer, wine or cheese? A Bloom filter is for when you have limited space, high query cost, and mostly negative queries. In that case, a BF with 8 bits per key and 4 hash functions gives you 2.5% false positive rate; you process queries nearly 40 times faster than before, at the cost of 1 byte per key. On the other hand, if any of the previous conditions do not hold, a hash table acting as a cache makes sense, though it obviously will take a lot more than one byte per entry :-) You can even skip over the hard edge cases of cuckoo hashing if it's a cache. That also makes the size-increase problems of cuckoo hash tables (or anything other than linear hash) moot. my2c