HashMap with ~100 million keys, still constant time?

tags:

hashmap

views:

320

answers:

HashMap with ~100 million keys, still constant time?

Does anyone know the answer to this question?

+5 A:

Yes, still constant time (amortized).

Justice 2009-12-23 15:53:41

what does amortized mean

SuperString 2009-12-23 15:56:14

In theory... Memory paging could be an issue. Unlikely but possible.

Matthew Whited 2009-12-23 15:56:32

Amortized means that some individual inserts may take a longer time than others, but the average time remains constant.

JSBangs 2009-12-23 15:57:00

It means that some accesses will take longer, but if you look at the performance over all queries the running time is O(1). That is, that over a sequence of accesses the average access time will be constant: http://en.wikipedia.org/wiki/Amortized_analysis

tvanfosson 2009-12-23 15:59:17

+5 A:

Yes. To search a hash map with 100 million items added to it, you do this:

1) Calculate the hash of the object you're looking for.
2) Find that bucket
3) Search through that bucket for the item.

(1) is independent of the size of the hash map or number of items in it.
(2) is O(1), assuming a standard hashmap implemented as an array of linked lists.
(3) takes an amount of time related to the number of items in the bucket, which should be approximately (number of items added to hash) / (number of buckets). This part will start at O(1), but will very slowly increase as the number of items begins to greatly exceed the number of buckets.

For almost any purpose, Hash Maps can be considered O(1) for both insertion and retrieval, even with very large data sets, as long as you start with a sufficiently large number of buckets.

Aric TenEyck 2009-12-23 16:04:32

+1 good answer, point (3) is important

Paolo 2009-12-23 16:10:21

And provided the hash is evenly distributed for your data set.

Yann Schwartz 2009-12-23 16:19:23

is there a way to increase the number of buckets in C++ so that each bucket only has 1 element?

SuperString 2009-12-23 22:30:53

@Danny: That depends on your hash map algorithm. I'm not familiar with the C++ STL (assuming that's what you're using), so I can't say.That said, increasing to the point where each bucket only has one element is overkill - the Birthday Paradox says that for 100 million elements, you'd need a hash map with quadrillions of entries to avoid duplicates, assuming a properly distributed random hash.

Aric TenEyck 2009-12-23 22:45:01

You could use dynamic perfect hashing.http://en.wikipedia.org/wiki/Dynamic_perfect_hashing

Michael Munsey 2010-04-21 23:01:36

ansaurus

tags:

views:

answers:

HashMap with ~100 million keys, still constant time?

related questions