Just to detail a bit over what have already been said.
Sorted Containers
The immutability is extremely important here: std::map
and std::set
are usually implemented in terms of binary trees (red-black trees for my few versions of the STL) because of the requirements on insertion, retrieval and deletion operation (and notably because of the invalidation of iterators requirements).
However, because of immutability, as you suspected there are other candidates, not the least of them being array-like containers. They have here a few advantages:
- minimal overhead (in term of memory)
- contiguity of memory, and thus cache locality
Several "Random Access Containers" are available here:
Boost.Array
std::vector
std::deque
So the only thing you actually need to do can be broken done in 2 steps:
- push all your values in the container of your choice, then (after all have been inserted) use
std::sort
on it.
- search for the value using
std::binary_search
, which has O(log(n)) complexity
Because of cache locality, the search will in fact be faster even though the asymptotic behavior is similar.
If you don't want to reinvent the wheel, you can also check Alexandrescu's [AssocVector][1]
. Alexandrescu basically ported the std::set
and std::map
interfaces over a std::vector
:
- because it's faster for small datasets
- because it can be faster for frozen datasets
Unsorted Containers
Actually, if you really don't care about order and your collection is kind of big, then a unordered_set
will be faster, especially because integers are so trivial to hash size_t hash_method(int i) { return i; }
.
This could work very well... unless you're faced with a collection that somehow causes a lot of collisions, because then unsorted containers will search over the "collisions" list of a given hash in linear time.
Conclusion
Just try the sorted std::vector
approach and the boost::unordered_set
approach with a "real" dataset (and all optimizations on) and pick whichever gives you the best result.
Unfortunately we can't really help more there, because it heavily depends on the size of the dataset and the repartition of its elements