views:

418

answers:

3

What is the more efficient approach for using hashmaps?

A) Use multiple smaller hashmaps, or

B) store all objects in one giant hashmap?

(Assume that the hashing algorithm for the keys is fairly efficient, resulting in few collisions)

CLARIFICATION: Option B implies segregation by primary key -- i.e. no additional lookup is necessary to determine which actual hashmap to use. (For example, if the lookup keys are alphanumeric, Hashmap 1 stores the A's, Hashmap 2 stores B's, and so on.)

+5  A: 

Definitely B. The advantage of hash tables is that the average number of comparisons per lookup is independent of the size.

If you split your map into N smaller hashmaps, you will have to search half of them on average for each lookup. If the smaller hashmaps have the same load factor that the larger map would have had, you will increase the total number of comparisons by a factor of approximately N/2.

And if the smaller hashmaps have a smaller load factor, you are wasting memory.

All that is assuming you distribute the keys randomly between the smaller hashmaps. If you distribute them according to some function of the key (e.g. a string prefix) then what you have created is a trie, which is efficient for some applications (e.g. auto-complete in web forms.)

finnw
The first sentence assumes that the objects' hashcode methods all generate well-distributed hash values. In a worst case scenario (i.e. where all objects hash to the same value) hashtable lookup will be `O(N)`.
Stephen C
+4  A: 

Are these maps used in logically distinct places? For instance, I wouldn't have one map containing users, cached query results, loggers etc, just because you happen to know the keys won't clash. However, I equally wouldn't split up a single map into multiple maps.

Keep one hashmap for each logical mapping from key to value.

Jon Skeet
+1  A: 

In addition @Jon's answer, there can be practical reasons why you want to maintain separate hash tables.

If you have separate tables for different mappings you can 'clear' each of the mappings independently; e.g. by calling 'clear' or getting rid of the reference to the corresponding table.

If the separate tables hold mappings to cached entries, you can use different strategies to 'age' the respective entries.

If the application is multi-threaded, using separate tables may reduce lock contention, and may (for some processor architectures) increase processor memory cache hit ratios.

Stephen C