ansaurus

Question

std::vector on VisualStudio2008 appears to be suboptimally implemented - too many copy constructor calls

Answer 1

+6 A:

My question is:

(a) why is the copy constructor called so often?

Because when the vector is re-sized you need to copy all the elements from the old buffer into the new buffer. This is because the vector guarantees that the objects are stored in consecutive memory locations.

(b) is there any way to avoid using the copy constructor if you're just moving an object from one location to another?

No there is no way to avoid the use of the copy constructor.
This because the object has several members that need to be initialized correctly.
If you used memcpy how do you know the object has been initialized correctly for the object!

For example. IF the object contained a smart pointer. You can't just memcpy a smart pointer. It needs to do extra work to track ownership. Otherwise when the original goes out of scope the memory is deleted and the new object has a dangling pointer. The same principle applies to all objects that have a constructor (copy constructor) the constructor actually does required work.

The way to stop the copy of the content is too reserve the space.
This makes vector allocate enough space for all the objects it will store. Thus it does not need to keep reallocating the main buffer. It just copies the objects into the vector.

Doubling each time should call the copy constructor 64 times. If you were very concerned about keeping memory usage low, then increasing by 50% each time should call the copy constructor 121 times. So where does the 177 come from?

Vector allocated size = 1:
Add element 1: (no reallocation) But copies element 1 into vector.
Add element 2: Reallocate buffer (size 2): Copy element 1 across. Copy element 2 into vector.
Add element 3: Reallocate buffer (size 4): Copy element 1-2 across. Copy element 3 into vector.
Add element 4: Copy element 4 into vector
Add element 5: Reallocate buffer (size 8): Copy element 1-4 across. Copy element 5 into vector.
Add element 6: Copy element 6 into vector
Add element 7: Copy element 7 into vector
Add element 8: Copy element 8 into vector
Add element 9: Reallocate buffer (size 16): Copy element 1-8 across. Copy element 9 into vector.
Add element 10: Copy element 10 into vector
etc.

First 10 elements took 25 copy constructions.
If you had used reserve first it would have only taken 10 copy constructions.

Martin York 2009-03-16 02:03:40

Everything you say is correct, but it doesn't really answer my question. If you reread my question, I understand why the copy constructor is called, why there's no generic way to avoid it, etc.

Tim Cooper 2009-03-16 02:41:29

@Tim: You should re-read the statement: This because the object has several ...

Martin York 2009-03-16 14:24:17

Answer 2

+4 A:

The STL does tend to cause this sort of thing. The spec doesn't allow memcpy'ing because that doesn't work in all cases. There's a document describing EASTL, a bunch of alterations made by EA to make it more suitable for their purposes, which does have a method of declaring that a type is safe to memcpy. Unfortunately it's not open source AFAIK so we can't play with it.

IIRC Dinkumware STL (the one in VS) grows vectors by 50% each time.

However, doing a series of push_back's on a vector is a common inefficiency. You can either use reserve to alleviate it (at the cost of possibly wasting memory if you overestimate significantly) or use a different container - deque performs better for a series of insertions like that but is a little slower in random access, which may/may not be a good tradeoff for you.

Or you could look at storing pointers instead of values which will make the resizing much cheaper if you're storing large elements. If you're storing large objects this will always win because you don't have to copy them ever - you'll always save that one copy for each item on insertion at least.

Peter 2009-03-16 02:28:10

Answer 3

A:

If I recall correctly, C++0x may have move semantics (in addition to copy semantics), that said, you can implement a more efficient copy constructor if you really want to.

Unless the copy constructor is complex, it is normally very efficient - after all, you are supposed to be doing little more than merely copying the object, and copying memory is very fast these days.

Arafangion 2009-03-16 03:13:43

Answer 4

A:

It looks like additions to C++0x will help here; see Rvalue and STL upgrades.

Dan 2009-03-16 03:19:41

Answer 5

+5 A:

Don't forget to count the copy constructor calls needed to push_back a temporary C object into the vector. Each iteration will call C's copy constructor at least once.

If you add more printing code, it's a bit clearer what is going on:

std::vector<C> A;
std::vector<C>::size_type prevCapacity = A.capacity();

for (int i=0; i < 50; i++) {
    A.push_back(i);
    if(prevCapacity != A.capacity()) {
       cout << "capacity " << prevCapacity << " -> " << A.capacity() << "\n";
    }
    prevCapacity = A.capacity();
}

This has the following output:

capacity 0 -> 1
capacity 1 -> 2
capacity 2 -> 3
capacity 3 -> 4
capacity 4 -> 6
capacity 6 -> 9
capacity 9 -> 13
capacity 13 -> 19
capacity 19 -> 28
capacity 28 -> 42
capacity 42 -> 63

So yes, the capacity increases by 50% each time, and this accounts for 127 of the copies:

1 + 2 + 3 + 4 + 6 + 9 + 13 + 19 + 28 + 42 = 127

Add the 50 additional copies from 50 calls to push_back and you have 177:

127 + 50 = 177

bk1e 2009-03-16 03:26:16

Answer 6

A:

To circumvent this issue, why not use a vector of pointers instead of a vector of objects? Then delete each element when destructing the vector.

In other words, std::vector<C*> instead of std::vector<C>. Memcpy'ing pointers is very fast.

zildjohn01 2009-03-16 03:44:22

1. Heap allocation is slower than stack allocation. 2. The bad data locality of the pointers in the vector makes the non-pointer version with consecutive objects run circles around the pointer version when the vector is actually *used*.

Johann Gerell 2009-03-16 08:08:39

I guess solving one problem always creates another one, haha.

zildjohn01 2009-03-19 14:01:16

Answer 7

A:

Just a note, be careful of adding pointers to the vector as a way of minimizing copying costs, since

The bad data locality of the pointers in the vector makes the non-pointer version with consecutive objects run circles around the pointer version when the vector is actually used.
Heap allocation is slower than stack allocation.

Do you more often use the vector or add stuff to it?

Johann Gerell 2009-03-16 08:10:39

ansaurus

tags:

views:

answers:

std::vector on VisualStudio2008 appears to be suboptimally implemented - too many copy constructor calls

related questions