views:

995

answers:

7

Please, excuse my ignorance about this subject :)

C++0x is introducing unordered_set which is available in boost and many other places. What I understand is that unordered_set is hash table with O(1) lookup complexity. On the other hand, set is nothing but a tree with log(n) lookup complexity. Why on earth would anyone use set instead of unordered_set? i.e is there a need for set anymore?

+1  A: 

Off hand, I would say it is convenient to have things in a relationship if you're looking to convert it into a different format.

It is also possible that whilst one is faster to access, the time to build the index or the memory used when creating and/or accessing it is greater.

Rushyo
+1, Big Oh notation hides the constant factors, and for typical problem sizes it's often the constant factors that matter most.
j_random_hacker
+9  A: 

Whenever you prefer a tree to a hash table.

For instance, hash tables are "O(n)" at worst case. O(1) is the average case. Trees are "O(log n)" at worst.

Mehrdad Afshari
/Balanced/ trees are O(ln n) in the worst case. You can end up with O(n) trees (essentially linked lists).
strager
If you can write a reasonably intelligent hash function, you can almost always get O(1) perf out of a hashtable.If you can't write such a hash function of if you need to iterate "in order" over your set, then you should use a tree. But you shouldn't use a tree because you're scared of "O(n) worst-case performance."
Justin L.
stager: To be pedantic, yes. However, we're talking about set in C++ which is typically implemented as a **balanced binary search tree**. We should have specify the actual operation to talk about complexity. In this context it's obvious that we're talking about lookup.
Mehrdad Afshari
Justin L: It's just **one** reason you might prefer a tree. The core of my answer is the first line. **Whenever** you prefer a tree data structure to a hash table. There are plenty of cases that trees are preferred to hash tables. Hash tables particularly suck at things like "range intersections."
Mehrdad Afshari
stl trees are almost universally implemented red-black trees, an advanced self balancing tree. There really are cases where O(n) look up in the worse case is not acceptable. A web service that provide and interface to store user values should not use a hash map, as a malicious user could effectively create a DoS by storing specially crafted values. Critical, time sensitive systems may also not allow for O(n) lookup, air traffic control etc. Though in general you're right, use the hash maps by default and only switch the tree version when you've got a real need.
caspin
@strager: The C++ standard mandates logarithmic (*not* amortized logarithmic) insertion and lookup times for all associative containers (`set`, `map` etc.) So balanced trees will be used, at least until someone comes up with a better data structure (don't hold your breath!)
j_random_hacker
+24  A: 

When, for someone who wants to iterate over the items of the set, the order matters.

moonshadow
I think your answer is what I am missing :)
AraK
+1  A: 

If you want to have things sorted, then you would use set instead of unordered_set. unordered_set is used over set when ordering stored does not matter.

leiz
+6  A: 

Because std::set is part of Standard C++ and unordered_set isn't. C++0x is NOT a standard, and neither is Boost. For many of us, portability is essential, and that means sticking to the standard.

anon
If i understand him correctly, he is not asking why people currently still use set. He is informing himself about C++0x.
Johannes Schaub - litb
Maybe. I thought everyone knew hash tables and trees solved different problems.
anon
Maybe, except me!
AraK
+13  A: 

Unordered sets have to pay for their O(1) average access time in a few ways:

  • set uses less memory then unordered_set to store the same number of elements.
  • For a small number of elements, lookups in a set might be faster than lookups in an unordered_set.
  • Even though many operations are faster in the average case for unordered_set, they are often guaranteed to have better worst case complexities for set (for example insert).
  • That set sorts the elements is useful if you want to access them in order.
  • You can compare different sets with == and the like. unordered_sets are not required to support these operations.
sth
+1, all excellent points. People tend to overlook the fact that hashtables have O(1) *average-case* access time, meaning they can occasionally have big delays. The distinction can be important for real-time systems.
j_random_hacker
+3  A: 

Consider sweepline algorithms. These algorithms would fail utterly with hash tables, but work beautifully with balanced trees. To give you a concrete example of a sweepline algorithm consider fortune's algorithm. http://en.wikipedia.org/wiki/Fortune%27s%5Falgorithm

ldog