ansaurus

Question

Answer 1

A:

I guess doing a bitshift by 10 instead would be more efficient than multiplying by 1000.

return (val.row.value()<<10) + val.col.value();

Brian R. Bondy 2010-04-14 03:28:36

Don't prematurely optimize. #1 This is a microoptimization, and it's unlikely to save you much time. #2 it just makes the code more obscure. #3 If your compiler's smart, you can pick a number like 1024 instead of 1000, and your compiler will do the optimization automatically if it actually makes sense on your machine's instruction set.

Ken Bloom 2010-04-14 03:33:52

@Ken: I do agree with not prematurely optimizing in general, but for a simple hashing function I don't agree. It is a hash function or a mathematical function.

Brian R. Bondy 2010-04-14 03:47:49

Also, it doesn't matter whether `val.row.value() * 1000` is greater than `val.column.value()` because this is a hash code, so the only reason to compute it is to put the points in a somewhat random location in the hash table. Having overlaps and things like that helps matters.

Ken Bloom 2010-04-14 03:48:06

I agree that it's unlikely to make much (or any) performance difference, but IMO the bitshift form is clearer because it's exactly what he's trying to do - he doesn't actually want to know what the result is when multiplied by 1000, he wants to move some bits out of the way of others which is what a bitshift indicates. I'd find that a lot more intuitive if trying to debug a hash function than a multiply.

Peter 2010-04-14 03:48:11

@Brian Just to prove point #2, you got the operator precedence wrong. Your code is equivalent to `val.row.value() << (10 + val.col.value())` (which would be a very bad hash function indeed, since most values would map to bucket 0 after a modulus is taken). This is why it's not a good idea to mix bitwise operations with arithmetic operations, and why it's not a good idea to prematurely optimize in general.

Ken Bloom 2010-04-14 03:50:23

@Ken: fixed, but I don't agree with any of your points. But that's why you downvoted me, basically I don't see this as an optimization and don't see why multiplication is more clear. If you want 4x the apples then you would use multiplication, if you are computing a mathematical function then who cares. Anyway, moving on...

Brian R. Bondy 2010-04-14 04:11:28

Answer 2

+3 A:

Following the technique is given in Effective Java (2nd edition), and quoted from there in Programming in Scala. Have a prime constant (we'll say 51 but you may find something larger will give more even distribution here), and perform multiplication and addition as follows:

(51 + int_hash(row)) * 51 + int_hash(col)

For more values (say you add a z coordinate), just keep nesting, like

((51 + int_hash(row)) * 51 + int_hash(col)) * 51 + int_hash(z)

Where int_hash is a function for hashing a single integer. You can visit this page to find a bunch of good hash functions for single integers.

Ken Bloom 2010-04-14 03:32:29

Thanks for this, just what I needed.

quantumpotato 2010-10-10 21:10:27

Answer 3

A:

With a small enough domain, you might be able to come up with a perfect hash function. Or perhaps just use a 2 dimensional array. For larger data amounts, use a prime number based multiplication and mod to your table size (and if your table is a base 2 number in size). This eliminates the divide/mod that can be costly on smaller, embedded type systems.

Or find any number of integer based hash functions that already exist. Make sure you measure any hash function you create for collision. Enough collisions will eliminate any gains over O(n log n) methods such as maps/trees.

Michael Dorgan 2010-04-14 04:08:29

ansaurus

tags:

views:

answers:

Good hash function for a 2d index

related questions