C++: mixture between vector and list: something like std::rope?

Hi,

When storing a bunch of items and I don't need random access to the container, I am using an std::list which is mostly fine. However, sometimes (esp. when I just push back entries to the back and never delete somewhere in the middle), I wish I had some structure with better performance for adding entries.

std::vector is bad because:

It must reallocate if it doesn't fit anymore.
It doesn't really work for huge amounts of data (because you just cannot always get very big chunks of continuous free memory).

std::list is bad because:

It makes an allocation on every single push_back. That is slow and leads to a lot of memory fragmentation.

So, something in between is what I want.

Basically, I want something like std::list< boost::array<T, 100> > or so. Or maybe instead of 100, let it be 4096/sizeof(T). Maybe also std::list< std::vector<T> > and the first vectors can be small and then further ones can grow. Actually I want to have that hidden from the usage, so I can just do a mycontainer.push_back(x).

std::rope is a bit similar to that but it is not available in the standard.

Is there something like this in Boost or so?

Thanks, Albert

Surprisingly few people know about std::deque but it should really be the default for larger datasets

Martin Beckett 2010-09-22 17:42:44

I really should have known the STL better! I always wondered what the `deque` is for and how it really differs (internally) from a `vector` (I always thought it is just one single chunk and the start index can change when you do an insertion at the front).

Albert 2010-09-22 17:55:00

If you have a large `vector` of objects that are expensive to copy, the reallocation that occasionally occurs during a `push_back` may make `vector` an unacceptable choice.

James McNellis 2010-09-22 17:31:33

No, it's not O(1) for push_back on a vector. It is O(n). And on average it should be something like O(log n). And it doesn't work at all for big amount of data because it only can allocate one single big chunk of continuous memory. And std::list is slow because it has to make an allocation for every single push_back.

Albert 2010-09-22 17:36:19

@Albert: No, insertion at the end (i.e., what `push_back` does) has amortized constant time complexity (O(1)). This means that on average it takes constant time to insert an element at the end. It is only when a reallocation is required that insertion at the end takes linear time. The implementation of vector is such that reallocations happen relatively infrequently, usually by having the underlying storage grow exponentially.

James McNellis 2010-09-22 17:40:39

@James: Yes I know. But isn't that O(log n) then on average? I.e. if you insert n items, you basically need log(n) reallocations. In fact probably even more than just O(log n) because the reallocations don't just take constant time (they will take much more time for huge n).

Albert 2010-09-22 17:57:25

@Albert: No, it's still O(1). Consider a vector with a size of 1,000 and a capacity of 1,000. You insert an element and a reallocation takes place, increasing the capacity to 2,000. This reallocation takes 1,000 time units. You can then insert 999 more elements in constant time. The time the reallocation took (1,000 time units) is amortized over the 1 + 999 = 1,000 insertions you can perform before the next reallocation; 1,000 / 1,000 = 1. [Yes, this is simple, and not all implementations use a 2x scale factor; I think VC++ uses 1.5x, for example, but this is about how it works.]

James McNellis 2010-09-22 18:10:50

@Albert It's O(1), read about amortized analysis. If you insert N objects then you do at most 2N copy operations *in total*.

ybungalobill 2010-09-22 18:40:38

@ybungalobill: Ah, thanks, I see now. Well, I get about 3N copy operations while trying to calculate that but now I see the point. I.e. it means 3 copy operations on average. I.e. O(1). Thanks! :)

Albert 2010-09-22 19:16:15

ansaurus

tags:

views:

answers:

C++: mixture between vector and list: something like std::rope?

related questions