ansaurus

Question

Answer 1

+5 A:

No, there is no case invariant version of the hash function. Lower or upper case all strings before passing them to the hash function.

Niels Castle 2009-10-30 11:48:45

That's what I wrote in my last paragraph...

Smasher 2009-10-30 12:10:41

Yes, and I am confirming it as an solution.

Niels Castle 2009-10-30 14:25:05

Answer 2

+2 A:

It would be slightly faster, but it hurts your maintainability a lot. There is rarely a good reason for this type of micro-optimization. Just convert your strings to lower or upper case before hashing like you suggested.

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified" - Donald Knuth

Thorarin 2009-10-30 12:03:50

+1 for Knuth's FULL quote. So often we only get the "premature optimization is the root of all evil" part.

Gerry 2009-10-30 21:40:54

The Knuth quote is good, but you should have taken it to heart yourself, especially the "after that code has been identified" part - what makes you say it would be slightly faster? I don't think so. And both our opinions are essentially worthless without profiling the code.

mghie 2009-10-31 10:21:53

Answer 3

+2 A:

IMO the whole question is wrong. To quote the Wikipedia article on hash functions:

A hash function is any well-defined procedure or mathematical function which converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an index to an array.

Note the "amount of data" - there is no type specified, and indeed the Bob Jenkins hash function has an untyped parameter const Data pointing to the data to be hashed. Since the input data is not necessarily a sequence of characters there is no way to compute a "case-insensitive" hash value. And even if it were a sequence of characters - the upper- or lowercasing would depend on character set and encoding. So you would need to have different hash functions for ASCII strings, UTF-8 encoded strings, UTF-16 LE encoded strings, ... (you get the idea).

mghie 2009-10-30 12:30:50

hmm...that seems right...okay then, i will go for the upperCase then...or the unicode version of that

Smasher 2009-10-30 13:31:39

ansaurus

tags:

views:

answers:

Case-insensitive Bob Jenkins Hash?

related questions