views:

646

answers:

6

If I have a key set of 1000, what is a suitable size for my Hash table, and how is that determined?

+2  A: 

There's some discussion of these factors in the documentation for Hashtable

sblundy
+1  A: 

You need to factor in the hash function as well.

one rule of thumb suggests make the table size about double, so that there is room to expand, and hopefully keep the number of collisions small.

Another rule of thumb is to assume that you are doing some sort of modulo related hashing, then round your table size up to the next largest prime number, and use that prime number as the modulo value.

What kind of things are you hashing? More detail should generate better advice.

EvilTeach
+3  A: 

It depends on the load factor (the "percent full" point where the table will increase its size and re-distribute its elements). If you know you have exactly 1000 entries, and that number will never change, you can just set the load factor to 1.0 and the initial size to 1000 for maximum efficiency. If you weren't sure of the exact size, you could leave the load factor at its default of 0.75 and set your initial size to 1334 (expected size/LF) for really good performance, at a cost of extra memory.

You can use the following constructor to set the load factor:

Hashtable(int initialCapacity, float loadFactor)
Bill the Lizard
Assuming that the hash function is well-behaved over the set of expected keys. A home-brewed hash function may not behave well in a minimally-sized table. For a home-brewed function, you'd have to run experiments.
S.Lott
If the hash function isn't well-behaved, colliding elements will be stored in the same bucket (in a LinkedList). The table being minimally-sized will have no effect whatsoever on performance.
Bill the Lizard
A: 

Twice is good.

You don't have a big keyset. Don't bother about difficult discussions about your HashTable implementation, and go for 2000.

poulejapon
2000 does not make a good size, because it is not prime. 2001 would be good, it is not prime, but at least not even. Will distribute the keys in the table much better.A good hashtable will take care of a good hash function but most of the time, the size is used.
ReneS
This is an interesting question. Your statement is right if you use a hash key of type: H(s) = s[0] + b*s[1] + b^2s[2] + ... [N]I think today's industry standard is to use 2^k as size and better hash functions such as Jenkins's. Last time I checked the std was working with prime however.
poulejapon
Prime and odd numbers are cooler ;)
ReneS
+1  A: 

Let it grow. With this size, the automatic handling is fine. Other than that, 2 x size + 1 is a simple formula. Prime numbers are also kind of good, but as soon as your data set reaches a certain size, the hash implementation might decide to rehash and grow the table.

Your keys are driving the effectiveness and are hopefully distinct enough.

Bottom line: Ask the size question when you have problems such as size or slow performance, other than that: Do not worry!

ReneS
Worry about it if performance *in this area* becomes an issue. If you try to handle it up front you are more likely to insert a bug or simply have unnecessarily complex code which can cause a maintenance issue.
Michael Rutherfurd
I agree. Have the problem first and look for a solution afterwards.
ReneS
A: 

I'd like to reiterate what http://stackoverflow.com/users/33229/wwwflickrcomphotosrene-germany said above. 1000 doesn't seem like a very big hash to me. I've been using a lot of hashtables about that size in java without seeing much in the way of performance problems. And I hardly ever muck about with the size or load factor.

If you've run a profiler on your code and determined that the hashtable is your problem, then by all means start tweaking. Otherwise, I wouldn't assume you've got a problem until you're sure.

After all, in most code, the performance problem isn't where you think it is. I try not to anticipate.

Terry Lacy