views:

122

answers:

2

Hi,

When storing a bunch of items and I don't need random access to the container, I am using an std::list which is mostly fine. However, sometimes (esp. when I just push back entries to the back and never delete somewhere in the middle), I wish I had some structure with better performance for adding entries.

std::vector is bad because:

  • It must reallocate if it doesn't fit anymore.
  • It doesn't really work for huge amounts of data (because you just cannot always get very big chunks of continuous free memory).

std::list is bad because:

  • It makes an allocation on every single push_back. That is slow and leads to a lot of memory fragmentation.

So, something in between is what I want.

Basically, I want something like std::list< boost::array<T, 100> > or so. Or maybe instead of 100, let it be 4096/sizeof(T). Maybe also std::list< std::vector<T> > and the first vectors can be small and then further ones can grow. Actually I want to have that hidden from the usage, so I can just do a mycontainer.push_back(x).

std::rope is a bit similar to that but it is not available in the standard.

Is there something like this in Boost or so?

Thanks, Albert

+8  A: 

Have you considered using std::deque? Its elements are not stored contiguously but it does allow random access to elements; if you are only inserting elements at the beginning or end of the sequence, it may give better performance than a std::vector.

James McNellis
Surprisingly few people know about std::deque but it should really be the default for larger datasets
Martin Beckett
I really should have known the STL better! I always wondered what the `deque` is for and how it really differs (internally) from a `vector` (I always thought it is just one single chunk and the start index can change when you do an insertion at the front).
Albert
+1  A: 

Yes, it's called std::vector. It's O(1) time push_back which is almost always faster than std::list. (Yeah, and it's also memory efficient)

The most important feature of std::list is constant time deletion/insertion from the middle. If you don't need it choose std::vector.

ybungalobill
If you have a large `vector` of objects that are expensive to copy, the reallocation that occasionally occurs during a `push_back` may make `vector` an unacceptable choice.
James McNellis
No, it's not O(1) for push_back on a vector. It is O(n). And on average it should be something like O(log n). And it doesn't work at all for big amount of data because it only can allocate one single big chunk of continuous memory. And std::list is slow because it has to make an allocation for every single push_back.
Albert
@Albert: No, insertion at the end (i.e., what `push_back` does) has amortized constant time complexity (O(1)). This means that on average it takes constant time to insert an element at the end. It is only when a reallocation is required that insertion at the end takes linear time. The implementation of vector is such that reallocations happen relatively infrequently, usually by having the underlying storage grow exponentially.
James McNellis
@James: Yes I know. But isn't that O(log n) then on average? I.e. if you insert n items, you basically need log(n) reallocations. In fact probably even more than just O(log n) because the reallocations don't just take constant time (they will take much more time for huge n).
Albert
@Albert: No, it's still O(1). Consider a vector with a size of 1,000 and a capacity of 1,000. You insert an element and a reallocation takes place, increasing the capacity to 2,000. This reallocation takes 1,000 time units. You can then insert 999 more elements in constant time. The time the reallocation took (1,000 time units) is amortized over the 1 + 999 = 1,000 insertions you can perform before the next reallocation; 1,000 / 1,000 = 1. [Yes, this is simple, and not all implementations use a 2x scale factor; I think VC++ uses 1.5x, for example, but this is about how it works.]
James McNellis
@Albert It's O(1), read about amortized analysis. If you insert N objects then you do at most 2N copy operations *in total*.
ybungalobill
@ybungalobill: Ah, thanks, I see now. Well, I get about 3N copy operations while trying to calculate that but now I see the point. I.e. it means 3 copy operations on average. I.e. O(1). Thanks! :)
Albert