ansaurus

Question

Calculating the spread of the hash function for a hashmap which uses chaining

Answer 1

+1 A:

When I've worked on improving hash functions, I've use the sum of the squares of the lengths divided by the number of items inserted (and attempted to minimize the result). In your first example, you've inserted 8 items and the sum of the squares of the lengths is 16, so your "figure of merit" is 2.

In the second, you've inserted 20 items, and the sum of the squares is 130, so your figure of merit would be 6.5. I'd say the first was likely to be a better hash function in general (though I generally prefer to compare results from identical inputs).

Jerry Coffin 2010-10-16 19:23:56

Thanks, this makes sense and I will give it a try in comparing and improving the hash functions, but it does not give me a "percentage" of spread.

toefel 2010-10-17 14:09:35

Answer 2

+2 A:

If you're only using this to tune the hash functions themselves, you could compute a genuine measure of statistical dispersion, such as the Gini coefficient. On the other hand, if you're trying to make this a feature of the hash-map itself, I would recommend against it - computing a complicated benchmark as part of the 'is resize necessary' logic has its own performance costs; something naïve is probably better.

Ani 2010-10-16 19:26:12

Thansk, I will look into it. I am using it just for tuning the hashfunctions, for resizing I am using a configurable loadfactor (say .80)and resizestrategy (2.0).

toefel 2010-10-17 14:15:37

Answer 3

+1 A:

You probably care about the answer because you want to know how much work you're doing with chaining. Thus, you probably should instrument your hash map to output how much work it's doing (a few #ifdefs that increment a counter in the key methods will probably do the trick). You then can use the amount of work (# compares, #nodes followed, etc.) as a metric for your hash function, and as a bonus you get a nifty tool for performance tuning. Once you figure things out, you can remove the instrumentation.

Rex Kerr 2010-10-17 04:39:39

ansaurus

tags:

views:

answers:

Calculating the spread of the hash function for a hashmap which uses chaining

related questions