views:

97

answers:

4

Hi all,

I have an infinite streams of numbers coming and I have to detect the first duplicate element. I think of using hash table for the above problem i.e whenever a number arrives, check whether it is already there in the hash table or not. In case it has, stop otherwise add that number to hash table. Now my question is does hash table stores the integers values or only the hash values corresponding to those integers as key?

Thanks in advance

A: 

hash_map are hashed associative container, meaning that they associate a key to a value. So yes, the value you give it is stored in the hash_map. Usually, you have not access to the hashcode directly. To solve your problem with an hash_map, you will have to do something like myHashMap[myInt] = true;when you get myInt. Then for the next int, you will have to check if your int is already stored ...

As you have just to check if it is here or not that's perhaps not the best choice. A set seems well fitted. You perhaps have to check the performance of the fetching operation, but I think it is well optimized in STL set.

my2c

neuro
A: 

The hash function is used to place the pair <key,value>, so when you insert a number as a key, it get stored in the position assigned by the function with the data associated as value.

You only have to try to get that integer as a key and increase it value.

sui
A: 

Before using a container, you need to think about the time constraints the compiler may need to sort the elements into a set or an hash map, you mentioned an infinite stream of numbers. ? If you are only concened about your look up time then set is fine. If you use a hash_map what would you intend to put in the value field for a number you received ? And even if you use a vector you can use lower_bound to get the same look up time as for a set.

DumbCoder
A: 

The answer depends on the implementation constraints, but if the numbers are int32's or smaller, my preferred approach is to lazy-allocate 512MB (so individual physical pages will be allocated only should you use it, a private mmap of /dev/zero is the traditional approach under unix) and use it as a bitset. If the word size is smaller then you will have to resort to some sort of hash (probably multi-level), variant of a k-ary tree or some combination of the above.

Note that the multi-level hash or

If you are looking at anything larger than an int32 you haven't provided enough information to provide advice. The same applies if you can't afford 512MB of ram for your bitset.

Recurse