ansaurus

Question

How would you go about designing a function for a perfect hash?

Answer 1

A:

You can use map

std::string foo() { return "Foo"; }
std::string bar() { return "Bar"; }

int main()
{
   std::map<std::string, std::string (*)()> m;
   m["foo"] = &foo;
   m["bar"] = &bar; 
}

Vinay 2009-04-09 15:35:58

std::map doesn't use a hash - it is tree based

anon 2009-04-09 15:37:18

who voted this up?!?

Mitch Wheat 2009-04-09 15:40:09

why to invent the wheel, you can use existing libraries like map.

Vinay 2009-04-09 15:41:40

perhaps the questioner wanted the performance characteristics of hashing rather than those of tree searching?

anon 2009-04-09 15:43:23

@Mitch: what's the problem? We should often ask ourselves not how to do this or that but why I want to use this or that.

Mykola Golubyev 2009-04-09 15:43:26

Precisely. Why does EvilTeach want a hash function? There may be good reasons for exactly that, but I haven't seen them. For the general case of taking a string and executing a function, this works fine.

David Thornley 2009-04-09 16:02:34

For the reason stated by Niel. Performance.

EvilTeach 2009-04-09 16:13:06

Answer 2

+15 A:

See the gperf home page.

anon 2009-04-09 15:35:58

The gulling part is that there is a link to this at the bottom of the wikipedia page.

EvilTeach 2009-04-09 15:49:21

Answer 3

+1 A:

See:

What is a good Hash Function?

Best hashing algorithm in terms of hash collisions and performance

What is a performant string hashing function that results in a 32 bit integer with low collision rates?

Choosing a multiplier for a (string) hash function

Very low cost hash function

What’s the best hashing algorithm to use on a stl string when using hash_map?

Mitch Wheat 2009-04-09 15:37:12

Answer 4

A:

If collisions are absolutely not allowed, your only option is to keep track of every string in the database, which is probably not a best way to go.

What I would do is apply one of existing common strong hashing algorithms, such as: MD5 or SHA. There's miriads of samples all around, here's one for example: http://www.codeproject.com/KB/security/cryptest.aspx

galets 2009-04-09 15:41:26

Answer 5

A:

Well, there is no perfect hash function.

You have several that minimize collisions, but none eliminates them.

Can't advise one though :P

EDIT: The solution can't be finding a perfect hash function. The solution is to be aware of collisions. Generically a hash function has collisions. That obviously depends on the dataset and the size of the resulting hash code.

Megacan 2009-04-09 15:41:49

http://en.wikipedia.org/wiki/Perfect_hashing

Adam Rosenfield 2009-04-09 15:43:44

@Adam: There's a pretty big caveat there given that it only applies when there is a distinct data set. Since the OP didn't make any mention of limiting the strings being used I'd agree with Megacan that there is no perfect hash in this case. +1.

sipwiz 2009-04-09 15:47:58

The questioner does make that mention, at least implicitly - there were only four Beatles )or siz if you include the drummer they sacked and Stu whatsisname) - still, a fixed data set.

anon 2009-04-09 15:52:27

I was only saying, the solution can't be finding a perfect hash function. The solution is to be aware of collisions. Generically a hash function has collisions. That obviously depends on the dataset and the size of the resulting hash code.

Megacan 2009-04-09 16:00:03

Yes it will be a fixed data set, there are only so many strings he could type in a lifetime :). The point here though is that "there is no perfect hash function" is a truism unless there are explicit constraints specified. Megacan was logoically correct in my book.

sipwiz 2009-04-09 16:02:30

@megacan - follow the link in the answer I posted - perfect hash functions are possible and quite widely used

anon 2009-04-09 16:06:06

@spi You are correct. I should make it clear that that the table is static.

EvilTeach 2009-04-09 16:14:51

@Neil - I've read the link. It says that a perfect hash function maps a key to a single hashcode, and no 2 keys have the same hash? am I right? I'm not arguing with that. That can only be possible if the number of combinations of keys are less or equal than the number of hashcodes.

Megacan 2009-04-09 16:35:05

Answer 6

A:

Use a balanced binary tree. Then you KNOW behaviour is ALWAYS O(logn).

I strongly dislike hashes. People don't realise how much risk they take with their algorithm. They run some test data and then deploy in the field. I have NEVER seen a deployed hash algorithm get checked for behaviour in the field.

O(log n) is almost always acceptable in lieu of O(1).

Blank Xavier 2009-04-09 16:03:27

“O(log n) is almost always acceptable in lieu of O(1).” In many applications, this statement is flat out wrong. Just increase the number of data points to above a few millions to see this.

Konrad Rudolph 2009-04-09 16:08:10

Once you've done that, test. Hashes do not give guaranteed results, unless you know in advance what all possible inputs can be. A hash function that tends to clump the input will likely not give you O(1).

David Thornley 2009-04-09 16:13:06

In this case, all the inputs are known. They are sitting in the array.and input string is either an exact match, or a no-match.

EvilTeach 2009-04-09 16:18:56

No, all the dispatchable inputs are known. Once you've hashed the input string, and gotten a hit, you do need to compare.

David Thornley 2009-04-09 17:11:31

Konrad - yes. But the number of applications with so many datum are few and far between. They form a specific, rather rare class. This is why I said *almost always acceptable*.

Blank Xavier 2009-04-09 17:26:08

@david, ya, absolutely correct.

EvilTeach 2009-04-09 19:58:29

Answer 7

+1 A:

The summary lists both C and C++. Which of them are you looking for? C and C++ are two distinct languages, and differ greatly in their string handling and data structures (and the fact that the C ones work in C++ doesn't change that).

Why, specifically, do you want a perfect hash function? Is it that you want to associate a string with a function, and thought that would be a good way to do it? Is this some sort of homework assignment? Do you have a reason not to use map<> in C++? (Or unordered_map<> if available?)

If you do need a perfect hash, what are the constraints on the strings? Will there be a certain fixed set you want to dispatch on? What about strings that don't match one of the set? Are you willing to accept hits from random strings, or is the number of incoming strings limited?

If you could edit your question to include information like that, we could be a lot more helpful.

EDIT (in response to the first two comments):

OK, we should look at C solutions, since you presumably want this for both your C and C++ work. You presumably want the performance, but have you tested? If we're dealing with strings coming in on the I/O system, the time there is likely to dwarf the dispatch time.

You are expecting arbitrary strings. It's a bit much to expect a perfect hash function that will avoid all collisions from random data, so you do need to consider that.

Have you considered a trie? It may be more efficient than a perfect hash function (or may not be), it should be fairly easy to implement in C, and it will avoid problems with redoing your list of dispatching strings or possible collisions.

David Thornley 2009-04-09 16:10:29

I code in both c and c++, and god help me Pro*C.O(1) hashing for performance.Lol, no homework assignment. I am looking to build a tool to speed up some performance critical code. The example is made simple for discussion purposes. The real world use is not.

EvilTeach 2009-04-09 16:21:26

The strings will very in length. None of them will be of length zero.As a practical limit, no string in the array will be longer than 32 characters. What the caller passes in can be of any length, but if it is longer than the strings in the table it is the case of a no-match

EvilTeach 2009-04-09 16:23:28

ansaurus

tags:

views:

answers:

How would you go about designing a function for a perfect hash?

related questions