tags:

views:

84

answers:

1

Hi, I am looking for a family of hash function F1,..Fn, where each Fi maps any key in [0,1]. My first implementation was Fi(k) = F(k,i) = hash(i,hash(k,0)),. Here hash is the hashlittle function provided here (http://burtleburtle.net/bob/c/lookup3.c). I haven't looked under the hood of what exactly hashlittle does.

As sharp readers would have noticed, this will fails. My question is how to achieve this efficiently. My objective is to minimize, on average, the largest i for which Fi(k1) == Fi(k2) for any given k1,k2 pair. Of course it should be fast too..

Thanks
Sandeep

+3  A: 

Well, I've looked under the hood a bit.

uint32_t hashlittle( const void *key, size_t length, uint32_t initval)
{
  union { const void *ptr; size_t i; } u;     /* needed for Mac Powerbook G4 */

  u.ptr = key;
  if (HASH_LITTLE_ENDIAN && ((u.i & 0x3) == 0)) {

Writing u.ptr and then reading u.i is undefined behaviour.

EDIT

I think I understand now. You basically need hash functions that take two parameters as input. You can use nearly any hash function for this.

A hash function takes a data packet of an arbitrary bit size and transforms it into a data packet of a fixed bit size:

hashval = Hash(data, len);

You need a function where an additional parameter is given and used within the transformation, right?

hashval = Hash(data, len, addval);

The simplest way is to concatenate the additional value to the data packet:

memcpy((char *)data + len, &addval, sizeof(addval));
hashval = Hash(data, len + sizeof(addval));

If you have the source available, another way is to modify it to use the new parameter as initialization for the internal hash calculation. This is what was done in hashlittle.

Before:
uint32_t Hash (const void *data, size_t len)
{
    uint32_t hashval = 0;
    ....
    return (hashval);
}

After:
uint32_t Hash (const void *data, size_t len, uint32_t init)
{
    uint32_t hashval = init;
    ....
    return (hashval);
}

This option may be a bit harder to do, as the internal state can be much more than a single hashval, and the initialization can be quite sophisticated instead of simply using a 0. In hashlittle it is:

/* Set up the internal state */
a = b = c = 0xdeadbeef + ((uint32_t)length) + initval;
Secure
I am afraid I don't quite understand. Are you saying that functions in lookup3.c is not implemented correctly. My question is not really about usage of lookup3.c but that of what is appropriate choice for hash function in this scenario. What I documented here was my current approach.-Sandeep
Sandeep
"Not implemented correctly"... you could say so, yes. I think it is a perfect example of the evilness of premature optimization. He tried to get speed by reading the data in 32 bit units instead of byte by byte (which in itself is undefined behaviour again if the data wasn't initialized in this way) and had to jump through so many hoops, including separate functions for little and big endian machines, it is simply ridiculous. It seemed to be important enough for you since you've mentioned it, and I'm always interested in such things, so I took a look.
Secure
For your question, sorry, I don't understand it.
Secure
I see what you are trying to get at. As per the question, let me rephrase: I simply need a hash that maps a pair of integer (a,b) to [0,1]. thanks.
Sandeep
Thank you. This is working as expected.
Sandeep