views:

759

answers:

6

Hi,

what of those two is faster for random insertions and deletions? I guess list, having the values as the keys as it is with sets seems to be attractive too though. Is performance similar for iterating over the whole container?

Thanks!

+7  A: 

std::list is O(1) for inserts and deletions. But you may well need O(n) to find the insertion or deletion point. std::set is O(log(n)) for inserts and deletions, it is usually implemented as a red-black tree.

Consider the effort to find the insert/delete point to make your choice.

Hans Passant
thanks! so can you say that if I can live with the semantics of a set, it is most likely faster than a list? Or am I misunderstanding the O(log(n))?
I can't tell, you didn't specify how you are accessing the list to find the insertion/deletion point. If you already have an iterator anyway that happens to be at the right spot, O(1) is impossible to beat.
Hans Passant
ah yes, thanks! list is my choice!
+1  A: 

First think about semantics, then about performance.

If you have a set of integers, and you add 6, 8, 13, 8, 20, 6, and 50, you'll end up with a set containing { 6, 8, 13, 20, 50 }.

If you do that with a list, you'll end up with a list containing { 6, 8, 13, 8, 20, 6, 50 }.

So, what do you want? It doesn't make sense to compare the speed of containers with such different semantics.

Daniel Daranas
well for my purpose both would work.- Anyways I think I feel more comfortable with lists. thanks!
If both work then you're saying the order doesn't matter. This is in fact one point for a set, not for a list. You shouldn't choose a container based on what you're comfortable with: they are all very similar in terms of interface, so if you're comfortable with one you're comfortable with nearly all of them.
wilhelmtell
+2  A: 

If you care about speed then you should probably use std::vector. std::list performs one heap allocation every time an element is inserted and that's usually a bottleneck.

An exception is when individual items are very expensive to copy, or when you have an awful lot of them. In those cases a list is likely to perform better as it doesn't have to move items around when it's resized. A std::deque is also a good option here, but you'll need to profile your application in order to decide between the two.

Finally, use std::set only if you need to keep your items sorted (or if you don't want repeated items). Otherwise it will be significantly slower than a list or vector.

Manuel
std::set has other advantages, such as if you don't want dupes in the collection
anon
@Neil - Thanks, fixed
Manuel
hmmm I could use a list with preallocated memory though. (using a custom allocator). My biggest concern with vector is the deletion at random places because I am scared that it reallocates the whole array each time I do so. Is that an issue at all?
@mokaschitta: It is my understanding that when vectors grow, they hang on to their allocated memory when you delete elements or even when you clear the whole array. I don't know if this behavior is mandated by the standard, or just what is typically implemented (I'm sure someone will step in and clarify). In any case, you usually need to resort to this the trick (short of deleting the vector object itself) to reclaim memory from a previously large vector: `vec.swap(vector<int>());`
Emile Cormier
@mokaschitta - if you're that concerned about allocations you should consider using a memory pool (check Boost for that) or even better an intrusive container (again, check Boost)
Manuel
+2  A: 

You should measure performance yourself with realistic usage on realistic data. Check both typical and worst case performance.

Although std::vector has O(N) time complexity for random insertion, std::set O(log(N)) and std::list O(1), std::vector performs best in many cases. Only if performance is not important enough to spend time measuring, go by the big-O complexity.

"If you're not measuring you're not engineering" (Rico Mariani)

Don Reba
+5  A: 

List

  1. Searching (linear time).
  2. Inserting, deleting, moving (takes constant time).
  3. Elements may be ordered.
  4. Elements may be sorted.
  5. Elements may be duplicate.

Set

  1. Searching (logarithmic in size).
  2. Insert and delete (logarithimic in general).
  3. Elements are un-ordered.
  4. Elements are always sorted from lower to higher.
  5. Elements are unique.
Dave18
In sets, elements are ordered, just not by insertion order. They are sorted.
TokenMacGuy
I mistyped that one, thanks.
Dave18
@TokenMacGuy, @Dave17 C++ sets indeed are sorted, but they are *not* ordered in the sense that you can't enforce an arbitrary order by placing an arbitrary element in an arbitrary position. That's the key difference between "ordered" and "sorted". It's much better to view sets in their mathematical sense: they are simply not ordered, period. And the fact that they are sorted is an implementation detail that is there only to support efficiency. Bottom line: use sets if order doesn't matter but unique elements does.
wilhelmtell
+1 I added a new point to answer.
Dave18
@wilhelmtell: the sorted property of sets is not an arbitrary implementation detail, but rather a fixed guarantee of the library. Another collection, `std::hashset` provides set semantics but no in-order traversal. Different container for different use patterns.
TokenMacGuy
A: 

In an std::list, the insertion and deletion itself take a time in O(1), which means very fast, and above all means at a speed that does not depend on the number of elements in the list.

In an std::set, the insertion and deletion take time in O(log(N)), which means a bit slower if a lot of elements are contained in the set. The N in the expression O(log(N)) means the number of elements. Grosso modo, it means that the time taken by the operation is kind of proportional to the logarithm (the base does not matter here, since it is equivalent to multiplying by a constant, which is ignored in theoretical algorithm analysis) of the number of elements in the set.

But it is vital to take into account the time taken for finding the element to remove. If you must search the container for the element to remove, which is most probably the case, then the std::list will take quite a long time for this search, which will be in O(N) (which means not fast, because the time is directly proportional to the number of elements, not the logarithm of it), while the std::set will take a time in O(log N) for the search.

Also note that those theoretical analyses become absolutely invalid for containers with very few elements, in which case the multiplying constants they hide become more important than the time function family it focuses on.

To make it short: std::list => Slower for searching the element to delete; faster to delete it. std::set => Faster for searching the element to delete; less fast to delete it.

But for the whole operation, and for big number of elements, the std::set is better.

You should also consider using hash tables. Good implementions of those are available in Boost, Qt, or C++0x. They do all these operations in time tending towards O(1) (which mean very very fast).

Dom
Hashes are never O(1). Not even the good ones. Unless of course you custom-make a hash function that can only accepts a finite set of keys.
wilhelmtell