tags:

views:

236

answers:

2

Does anyone have any good intuition for a good hash function for a sparse bit vector? To give a concrete example, say I want to hash a 4096 bit integer where the probability of each bit being 1 is 10%.

I want to get some compression in the hash. For example 4096 bits in and 32 bits out. This is just an example to illustrate what I am looking for. Of course, all answers are very much appreciated.

+2  A: 

Would a Bloom filter help?

If the bit vector is 2^32 bits, then why not just use a 32 bit integer?

Matt Curtis
You may want to amend your statement "if the bit vector is 2^32 bits" ;]
Daniel LeCheminant
Interesting. Do you have any idea of what to use for the hash functions for the Bloom Filter?
DasBoot
Say for a 32 bit integer (as per original edit), you could have a size 2^16 bit vector, and set bit n if integer n/(2^16) is set. Then if you want to know if x is in your original data, you know "yes or maybe" (never "no") if bit x/(2^16) is set. Works if the set is sparse and expensive to search.
Matt Curtis
Check out the book Programming Pearls for more info and ideas on squeezing data: <http://netlib.bell-labs.com/cm/cs/pearls/>
Matt Curtis
A: 

I would just hash the bits as usual by calling

hash<vector<bool>>(...)

if you're using C++0x, or else see boost::hash.

Neil G