Hashtables/Dictionaries that use floats/doubles

A hash algorithm is, in general terms, just a function that produces a smaller output from a larger input. Good hash functions have interesting properties like a large change in output for a small change in the input, and an assurance that they produce every possible output value for some input.

It's not hard to write a simple polynomial type hash function that outputs a floating-point value, rather than an integer value, but it's difficult to ensure that the resulting hash function has the desired properties without getting into the details of the particular floating-point representation used.

At least part of the reason that hash functions are nearly always implemented in integer arithmetic is because proving various properties about an integer calculation is easier than doing the same for a floating point calculation.

It's fairly easy to prove that some (sum of prime factors) modulo (another prime) must, necessarily, produce every possible output for some input. Doing the same for a calculation with a bunch of floating-point fractions would be a drag.

Add to that the relative difficulty of storing and transmitting floating-point values without corruption, and it's just not worth it.

Good answer. Indeed, keys in hashtables are nothing more than bit arrays, and an `int` type is simply the most convenient way of representing this.

Noldorin 2009-06-03 18:10:47

Yeah: using a long as a hash, technically, is the same as using a double (64 bit array). If you wanted longer, you could go to a 128 bit type, such as a GUID (which would be equivalent to decimal in .NET). Math is often faster on integer types than on floating point types, though.

Reed Copsey 2009-06-03 18:17:07

Thanks Reed. Yeah I was wondering this to have a larger set of hash values. So using double in a .net dictionary would still use ints, right?

Joan Venge 2009-06-03 18:52:19

@Joan Venge: Yes. All of the hashable types use Object.GetHashCode(), which returns an int. Having a double as a key just does (302.298).GetHashCode(). This means you have a 32bit hash in .NET (using the standard classes). This would be the same as a float, but in theory, a long could give you 2x more possible hash values, and a GUID could do 4x.

Reed Copsey 2009-06-03 18:59:09

Thanks Reed. You mean having a hashfunction to return a GUID would give me 4x possible hash values?

Joan Venge 2009-06-03 19:01:45

No. You'd need to write a custom hashing algorithm for your key that returned a GUID, and a custom Dicionary/HashTable class that internally used a GUID instead of an int. All of the BCL hashing routines end up using an int on the hashing... Still, they chose an int for good reason. Math on int is fast, and since it's 32bit, you have 2^32 possible unique hash values. Dictionaries function even with non-unique hashing (just not as well), since hash calculations tend to have collisions in general.

Reed Copsey 2009-06-03 19:16:33

If your goal is to get a better Dictionary, though - you're much, much better off trying to make sure your GetHashCode() implementation for your keys in the dictionary follow the rules and avoid collisions as much as possible. No matter how large of a hashing element you use, the hashing will only be as good as your algorithm for compuing hash IDs.

Reed Copsey 2009-06-03 19:17:36

Thanks Reed. I just didn't understand this:"This would be the same as a float, but in theory, a long could give you 2x more possible hash values, and a GUID could do 4x. "How does it affect hash value range set?

Joan Venge 2009-06-03 19:21:45

By default, Dictionary in .NET uses int for it's hash [it's "int System.Object.GetHashCode()]. This means that it has 32 unique bits for the hash, so you get 2^32 values (which would be the same as float, since float is 32bits). If you built a custom dictionary that used long (or double) types instead of int, you would have 64 unique bits. With GUID (or decimal) it'd be 128 unique bits. It's not 2x, it's actually 2^32 vs. 2^64 vs. 2^128 - I misspoke - it's 2x the bits, not 2x hash values.

Reed Copsey 2009-06-03 19:47:03

That being said, 4,294,967,296 possible hash values is really plenty - it'd be very tough to put enough entries into an in-memory dictionary where that wasn't adequate. :) If you were designing a hard-disk based hashing routine, you might (possibly) need to use a long, instead, since that'd give you 18,446,744,073,709,551,616.

Reed Copsey 2009-06-03 19:48:32

Thanks Reed. Now I got it.

Joan Venge 2009-06-03 20:08:54

ansaurus

tags:

views:

answers:

Hashtables/Dictionaries that use floats/doubles

related questions