You might be OK, if you're sure that you intent to count the number of instances of an exact float value.
As David says, the inherent problem of a hashtable keyed on floats is that hashtables use equality to identify keys, and equality of floats is a slightly unreliable concept due to calculation errors. There is no general guarantee that sin(pi / 6) == 0.5
, or even that (2.0 / 3) * (2.0 / 3) == (4.0 / 9)
. In both cases, the LHS might be a bit or more different from the RHS.
So, if some of the entries you're counting are entered as 0.5
, and some are computed as sin(pi / 6)
, and you want them to be counted together, then you need to do more than just hash on a float value.
You might get away with rounding and then hashing, although you never escape the issue entirely. For instance if you round to the nearest 0.001, then you'll identify 0.2020001 and 0.2020003 as being "the same value, with a calculation error", but not the equally-close-together 0.1014999 and 0.1015001. I've used base-10 examples for ease of typing, but of course "float" generally means a binary representation.
Exactly the same issue would apply with a binary tree. Hashtables don't really care what their key data "is", they just care that someone can provide a function h
which maps keys to integers, such that for any x
and y
you want to consider "equal", h(x) == h(y)
. Then for performance, you want h
to introduce no more "collisions" (instances of h(x) == h(y)
where x != y
) than random chance. There's no obstacle at all to doing this with floats. You must ensure that you don't include anything in the hash that doesn't participate in comparisons, and it helps if you include all the information that does participate in comparisons.
If you can solve the question of what it is you're actually counting, then that might lead you to the data structure you need. If you do want some tolerance in the matches, you might just be best off sorting all your floats and then looking for clusters of values.