Hi,
I am looking for a hash functions family generator that could generate a family of hash functions given a set of parameters. I haven't found any such generator so far. Is there a way to do that with the hashlib package ?
For example I'd like to do something like : h1 = hash_function(1) h2 = hash_function(2) ...
and h1 and h2 would be different hash functions.
For those of you who might know about it, I am trying to implement a min-hashing algorithm on a very large dataset.
Basically, I have a very large set of features (100 millions to 1 billion) for a given document, and I need to create 1000 to 10000 different random permutations for this set of features.
I do NOT want to build the random permutations explicitly so the technique I would like to use in the following : generate a hash function h and consider that for two indices r and s, r appear before s in the permutation if h(r) < h(s) and do that for 100 to 1000 different hash functions.
Are there any known libraries that I might have missed ? Or any standard way of generating families of hash functions with python that you might be aware of ?
Thanks for your help,
Best,
Nicolas.