views:

280

answers:

4

I read somewhere about other data structures similar to hashtables, dictionaries but instead of using ints, they were using floats/doubles, etc.

Anyone knows what they are?

+8  A: 

If you mean using floats/doubles as keys in your hash, that's easy. For example, in .NET, it's just using Dictionary<double,MyValueType>.

If you're talking about having the hash be based off a double instead of an int....

Technically, you can have any element as your internal hash. Normally, this is done using an int or long, since these are fast, and the hashing algorithm is easy to compute.

However, the hash is really just a BitArray at heart, so anything would work. There really isn't much advantage to making this something other than an int or long, other than potentially allowing a larger set of hash values (ie: if you go to an 8 byte or larger type for your hash).

Reed Copsey
Good answer. Indeed, keys in hashtables are nothing more than bit arrays, and an `int` type is simply the most convenient way of representing this.
Noldorin
Yeah: using a long as a hash, technically, is the same as using a double (64 bit array). If you wanted longer, you could go to a 128 bit type, such as a GUID (which would be equivalent to decimal in .NET). Math is often faster on integer types than on floating point types, though.
Reed Copsey
Thanks Reed. Yeah I was wondering this to have a larger set of hash values. So using double in a .net dictionary would still use ints, right?
Joan Venge
@Joan Venge: Yes. All of the hashable types use Object.GetHashCode(), which returns an int. Having a double as a key just does (302.298).GetHashCode(). This means you have a 32bit hash in .NET (using the standard classes). This would be the same as a float, but in theory, a long could give you 2x more possible hash values, and a GUID could do 4x.
Reed Copsey
Thanks Reed. You mean having a hashfunction to return a GUID would give me 4x possible hash values?
Joan Venge
No. You'd need to write a custom hashing algorithm for your key that returned a GUID, and a custom Dicionary/HashTable class that internally used a GUID instead of an int. All of the BCL hashing routines end up using an int on the hashing... Still, they chose an int for good reason. Math on int is fast, and since it's 32bit, you have 2^32 possible unique hash values. Dictionaries function even with non-unique hashing (just not as well), since hash calculations tend to have collisions in general.
Reed Copsey
If your goal is to get a better Dictionary, though - you're much, much better off trying to make sure your GetHashCode() implementation for your keys in the dictionary follow the rules and avoid collisions as much as possible. No matter how large of a hashing element you use, the hashing will only be as good as your algorithm for compuing hash IDs.
Reed Copsey
Thanks Reed. I just didn't understand this:"This would be the same as a float, but in theory, a long could give you 2x more possible hash values, and a GUID could do 4x. "How does it affect hash value range set?
Joan Venge
By default, Dictionary in .NET uses int for it's hash [it's "int System.Object.GetHashCode()]. This means that it has 32 unique bits for the hash, so you get 2^32 values (which would be the same as float, since float is 32bits). If you built a custom dictionary that used long (or double) types instead of int, you would have 64 unique bits. With GUID (or decimal) it'd be 128 unique bits. It's not 2x, it's actually 2^32 vs. 2^64 vs. 2^128 - I misspoke - it's 2x the bits, not 2x hash values.
Reed Copsey
That being said, 4,294,967,296 possible hash values is really plenty - it'd be very tough to put enough entries into an in-memory dictionary where that wasn't adequate. :) If you were designing a hard-disk based hashing routine, you might (possibly) need to use a long, instead, since that'd give you 18,446,744,073,709,551,616.
Reed Copsey
Thanks Reed. Now I got it.
Joan Venge
A: 

Your question history shows that you use .Net, so I'll answer in that context.

If you want a Dictionary that is type aware, such that you can specify it should use floats or doubles for the keys or values, use System.Collections.Generic.Dictionary<T, U> http://msdn.microsoft.com/en-us/library/xfhwa508.aspx

If you want a Dictionary that is type blind, such that you can use floats AND doubles for keys and values, use System.Collections.HashTable http://msdn.microsoft.com/en-us/library/system.collections.hashtable.aspx

David B
+3  A: 

You mean as keys? That strikes me as tricky.

If you're using them as arbitrary keys, they're no better than integers.

If you expect to calculate a floating-point value and use it to look something up in a hash table, you're living very dangerously. Floating point numbers do not have infinite precision, and calculating the same thing in two slightly different ways can result in very tiny differences in the result. Hash keys rely on getting the exact same thing every time, so you'd have to be careful to round, and round in exactly the same way at all times. This is trickier than it sounds, by the way.

So, what would you do with floating-point hashes?

David Thornley
Thanks, just wondered about them for a larger hash value set. Although I remember seeing data stuctures that actually use them if I am not wrong.
Joan Venge
+1  A: 

A hash algorithm is, in general terms, just a function that produces a smaller output from a larger input. Good hash functions have interesting properties like a large change in output for a small change in the input, and an assurance that they produce every possible output value for some input.

It's not hard to write a simple polynomial type hash function that outputs a floating-point value, rather than an integer value, but it's difficult to ensure that the resulting hash function has the desired properties without getting into the details of the particular floating-point representation used.

At least part of the reason that hash functions are nearly always implemented in integer arithmetic is because proving various properties about an integer calculation is easier than doing the same for a floating point calculation.

It's fairly easy to prove that some (sum of prime factors) modulo (another prime) must, necessarily, produce every possible output for some input. Doing the same for a calculation with a bunch of floating-point fractions would be a drag.

Add to that the relative difficulty of storing and transmitting floating-point values without corruption, and it's just not worth it.

Mark Bessey