I'm developing a programming language, and in my programming language, I'm storing objects as hash tables. The hash function I'm using is Pearson Hashing, which depends on a 256-bit lookup table. Here's the function:
char* pearson(char* name, char* lookup)
{
char index = '\0';
while(*name)
{
index = lookup[index ^ *name];
name++;
}
return index;
}
My question is, given a fixed group of fewer than 256 member names, how can one determine a lookup
table such that pearson()
will return unique characters within a contiguous range starting from '\0'
. In other words, I need an algorithm to create a lookup table for a perfect hash. This will allow me to have objects that take up no more space than the number of their members. This will be done at compile time, so speed isn't a huge concern, but faster would be better. It would be easy to brute force this, but I think (hope) there's a better way.
Here's an example: given member variables 'foo', 'bar', and 'baz' in a class, I want to determine a lookup
such that:
pearson('foo',lookup) == (char) 0
pearson('bar',lookup) == (char) 1
pearson('baz',lookup) == (char) 2
Note that the order doesn't matter, so the following result would also be acceptable:
pearson('foo',lookup) == (char) 2
pearson('bar',lookup) == (char) 0
pearson('baz',lookup) == (char) 1
In an ideal world, all names that aren't in the table would return a value greater than 2 because this would allow me to avoid a check and possibly even avoid storing the member names, but I don't think this is possible, so I'll have to add an extra check to see if it's in the table. Given this, it probably would save time to not initialize values in the lookup table which aren't used (collisions don't matter, because if it collides and fails the check, it isn't in the object at all, so the collision doesn't need to be resolved; only the error needs to be handled).