tags:

views:

563

answers:

7

I've been comparing a STL implementation of a popular XmlRpc library with an implementation that mostly avoids STL. The STL implementation is much slower - I got 47s down to 4.5s. I've diagnosed some of the reasons: it's partly due to std::string being mis-used (e.g. the author should have used "const std::string&" wherever possible - don't just use std::string's as if they were Java strings), but it's also because copy constructors were being constantly called each time the vector outgrew its bounds, which was exceedingly often. The copy constructors were very slow because they did deep-copies of trees (of XmlRpc values).

I was told by someone else on StackOverflow that std::vector implementations typically double the size of the buffer each time they outgrow. This does not seem to be the case on VisualStudio 2008: to add 50 items to a std::vector took 177 calls of the copy constructor. Doubling each time should call the copy constructor 64 times. If you were very concerned about keeping memory usage low, then increasing by 50% each time should call the copy constructor 121 times. So where does the 177 come from?

My question is: (a) why is the copy constructor called so often? (b) is there any way to avoid using the copy constructor if you're just moving an object from one location to another? (In this case and indeed most cases a memcpy() would have sufficed - and this makes a BIG difference).

(NB: I know about vector::reserve(), I'm just a bit disappointed that application programmers would need to implement the doubling trick when something like this is already part of any good STL implementation.)

My test program:

#include <string>
#include <iostream>
#include <vector>



using namespace std;


int constructorCalls;
int assignmentCalls;
int copyCalls;


class C {
    int n;

public:
    C(int _n) { n = _n; constructorCalls++; }
    C(const C& orig) { copyCalls++; n = orig.n; }
    void operator=(const C &orig) { assignmentCalls++; n = orig.n; }
};



int main(int argc, char* argv[])
{
    std::vector<C> A;

    //A.reserve(50);
    for (int i=0; i < 50; i++)
        A.push_back(i);
    cout << "constructor calls = " << constructorCalls << "\n";
    cout << "assignment calls = " << assignmentCalls << "\n";
    cout << "copy calls = " << copyCalls << "\n";
    return 0;
}
+6  A: 

My question is:

(a) why is the copy constructor called so often?

Because when the vector is re-sized you need to copy all the elements from the old buffer into the new buffer. This is because the vector guarantees that the objects are stored in consecutive memory locations.

(b) is there any way to avoid using the copy constructor if you're just moving an object from one location to another?

No there is no way to avoid the use of the copy constructor.
This because the object has several members that need to be initialized correctly.
If you used memcpy how do you know the object has been initialized correctly for the object!

For example. IF the object contained a smart pointer. You can't just memcpy a smart pointer. It needs to do extra work to track ownership. Otherwise when the original goes out of scope the memory is deleted and the new object has a dangling pointer. The same principle applies to all objects that have a constructor (copy constructor) the constructor actually does required work.

The way to stop the copy of the content is too reserve the space.
This makes vector allocate enough space for all the objects it will store. Thus it does not need to keep reallocating the main buffer. It just copies the objects into the vector.

Doubling each time should call the copy constructor 64 times. If you were very concerned about keeping memory usage low, then increasing by 50% each time should call the copy constructor 121 times. So where does the 177 come from?

Vector allocated size = 1:
Add element 1: (no reallocation) But copies element 1 into vector.
Add element 2: Reallocate buffer (size 2): Copy element 1 across. Copy element 2 into vector.
Add element 3: Reallocate buffer (size 4): Copy element 1-2 across. Copy element 3 into vector.
Add element 4: Copy element 4 into vector
Add element 5: Reallocate buffer (size 8): Copy element 1-4 across. Copy element 5 into vector.
Add element 6: Copy element 6 into vector
Add element 7: Copy element 7 into vector
Add element 8: Copy element 8 into vector
Add element 9: Reallocate buffer (size 16): Copy element 1-8 across. Copy element 9 into vector.
Add element 10: Copy element 10 into vector
etc.

First 10 elements took 25 copy constructions.
If you had used reserve first it would have only taken 10 copy constructions.

Martin York
Everything you say is correct, but it doesn't really answer my question. If you reread my question, I understand why the copy constructor is called, why there's no generic way to avoid it, etc.
Tim Cooper
@Tim: You should re-read the statement: This because the object has several ...
Martin York
+4  A: 

The STL does tend to cause this sort of thing. The spec doesn't allow memcpy'ing because that doesn't work in all cases. There's a document describing EASTL, a bunch of alterations made by EA to make it more suitable for their purposes, which does have a method of declaring that a type is safe to memcpy. Unfortunately it's not open source AFAIK so we can't play with it.

IIRC Dinkumware STL (the one in VS) grows vectors by 50% each time.

However, doing a series of push_back's on a vector is a common inefficiency. You can either use reserve to alleviate it (at the cost of possibly wasting memory if you overestimate significantly) or use a different container - deque performs better for a series of insertions like that but is a little slower in random access, which may/may not be a good tradeoff for you.

Or you could look at storing pointers instead of values which will make the resizing much cheaper if you're storing large elements. If you're storing large objects this will always win because you don't have to copy them ever - you'll always save that one copy for each item on insertion at least.

Peter
A: 

If I recall correctly, C++0x may have move semantics (in addition to copy semantics), that said, you can implement a more efficient copy constructor if you really want to.

Unless the copy constructor is complex, it is normally very efficient - after all, you are supposed to be doing little more than merely copying the object, and copying memory is very fast these days.

Arafangion
A: 

It looks like additions to C++0x will help here; see Rvalue and STL upgrades.

Dan
+5  A: 

Don't forget to count the copy constructor calls needed to push_back a temporary C object into the vector. Each iteration will call C's copy constructor at least once.

If you add more printing code, it's a bit clearer what is going on:

std::vector<C> A;
std::vector<C>::size_type prevCapacity = A.capacity();

for (int i=0; i < 50; i++) {
    A.push_back(i);
    if(prevCapacity != A.capacity()) {
       cout << "capacity " << prevCapacity << " -> " << A.capacity() << "\n";
    }
    prevCapacity = A.capacity();
}

This has the following output:

capacity 0 -> 1
capacity 1 -> 2
capacity 2 -> 3
capacity 3 -> 4
capacity 4 -> 6
capacity 6 -> 9
capacity 9 -> 13
capacity 13 -> 19
capacity 19 -> 28
capacity 28 -> 42
capacity 42 -> 63

So yes, the capacity increases by 50% each time, and this accounts for 127 of the copies:

1 + 2 + 3 + 4 + 6 + 9 + 13 + 19 + 28 + 42 = 127

Add the 50 additional copies from 50 calls to push_back and you have 177:

127 + 50 = 177
bk1e
A: 

To circumvent this issue, why not use a vector of pointers instead of a vector of objects? Then delete each element when destructing the vector.

In other words, std::vector<C*> instead of std::vector<C>. Memcpy'ing pointers is very fast.

zildjohn01
1. Heap allocation is slower than stack allocation. 2. The bad data locality of the pointers in the vector makes the non-pointer version with consecutive objects run circles around the pointer version when the vector is actually *used*.
Johann Gerell
I guess solving one problem always creates another one, haha.
zildjohn01
A: 

Just a note, be careful of adding pointers to the vector as a way of minimizing copying costs, since

  1. The bad data locality of the pointers in the vector makes the non-pointer version with consecutive objects run circles around the pointer version when the vector is actually used.
  2. Heap allocation is slower than stack allocation.

Do you more often use the vector or add stuff to it?

Johann Gerell