tags:

views:

1370

answers:

10

I am trying to optimize a C++ routine. The main bottleneck in this routine is the push_back() of a vector of objects. I tried using a deque instead and even tried a list. But strangely (and contrary to theory) deque and list implementations run much slower than the vector counterpart.

In fact even clear() runs much slower for the deque and list implementations than the vector counterpart. In this case too, Vector implementation seems to be the fastest while list implementation is the slowest.

Any pointers?

Note: vector reserve() could have sped the implementation but cannot be done as it is unknown in size.

Thanks.

+2  A: 

Are you pushing back the objects themselves, or a pointer to them? Pointers will usually be much faster as it's only 4-8 bytes to copy, compared to whatever the size of the objects are.

Yes, I am pushing back the object. I will try with pointers. Thanks.
Vidya
The object's data isn't copied ... The copy constructor is called. This is slower than a plain memcpy in some cases.
strager
A: 

You'll need to give more information on the behavior of the routine.

In one place you're concerned about the speed of push_back() in another you're concerned about clear(). Are you building up the container, doing something then dumping it?

The results you see for clear() are because vector<> only has to release a singl block of memory, deque<> has to release several, and list<> has to release one for each element.

Michael Burr
Results are similar for push_back() as well, with vector being the fastest and list being the slowest. The main routine that I am trying to optimize is one which has a for loop (going to count of 100 million or more) within which an object is being pushed back. Its parent routine clears the vector.
Vidya
Re using vector::reserve() - do you have a reasonable guess as to how many elements might be added? Performing a reserve( some_decent_guess) should reduce the number of realloc/copies cycles that occur, and as long as you're not reserving a huge amount it shouldn't hurt anything.
Michael Burr
+4  A: 

http://www.codeproject.com/KB/stl/vector_vs_deque.aspx
nice article about vector vs deque performance and memory usage

bb
A: 

Deque has a more complex structure than vector and the speed differences between the two will be heavily dependent on both the specific implementation and the actual number of elements pushed back, but for large amounts of data it should be faster. clear() may be slower because it may choose to get rid of the more complex underlying structures. Much the same goes for list.

anon
+1  A: 

vector being faster to build or clear than deque or list is to be expected; it's a simpler data structure.

With regard to vector::push_back, it has to do two things:

  1. check the vector is big enough to hold the new item.
  2. insert the new item.

You can generally speed things up by eliminating step 1 by simply resizing the vector and using operator[] to set items.

UPDATE: Original poster asked for an example. The code below times 128 mega insertions, and outputs

push_back           : 2.04s
reserve & push_back : 1.73s
resize & place      : 0.48s

when compiled and run with g++ -O3 on Debian/Lenny on an old P4 machine.

#include <iostream>
#include <time.h>
#include <vector>

int main(int,char**)
{
  const size_t n=(128<<20);

  const clock_t t0=clock();
  {
    std::vector<unsigned char> a;
    for (size_t i=0;i<n;i++) a.push_back(i);
  }
  const clock_t t1=clock();
  {
    std::vector<unsigned char> a;
    a.reserve(n);
    for (size_t i=0;i<n;i++) a.push_back(i);
  }
  const clock_t t2=clock();
  {
    std::vector<unsigned char> a;
    a.resize(n);
    for (size_t i=0;i<n;i++) a[i]=i;
  }
  const clock_t t3=clock();

  std::cout << "push_back           : " << (t1-t0)/static_cast<float>(CLOCKS_PER_SEC) << "s" << std::endl;
  std::cout << "reserve & push_back : " << (t2-t1)/static_cast<float>(CLOCKS_PER_SEC) << "s" << std::endl;
  std::cout << "resize & place      : " << (t3-t2)/static_cast<float>(CLOCKS_PER_SEC) << "s" << std::endl;

  return 0;  
}
timday
Could you please give me an example of the same? Thanks.
Vidya
Simpler doesn't always mean faster. A vector may be simple, but a hash map is faster for non-sequential non-int key lookups (in almost all cases).
strager
A: 

Regarding push_back() being slow and reserve being no help, the implementation of STL used in MSVC works something like this: When you first create a vector it reserves space for I think 10 elements. From then on, whenever it gets full, it reserves space for 1.5 times the number of elements in the vector. So, something like 10, 15, 22, 33, 49, 73, 105, 157... The re-allocations are expensive.

Even if you don't know the exact size, reserve() can be useful. reserve() doesn't prevent the vector from growing if it needs to. If you reserve() and the vector grows beyond that size, you have still improved things because of the reserve. If the vector turns out to be much smaller, well, maybe that's ok because the performance in general works better with smaller sizes.

You need to profile in RELEASE mode to know for sure what strategy works best.

Corey Trager
+1 to release mode. STL in debug mode in MSVC is VERY slow (but well checked).
Eclipse
+1  A: 

If you don't know how many object you'll be adding it's very difficult to come up with an optimal solution. All you can do is try to minimize the cost that you know is happening - which in this case is that your vector is being constantly resized.

You could do this in two ways;

1) Split your operation into building and finalizing. This is where you build the list into a vector that is guaranteed to be big enough and when done copy it to another vector.

E.g.

std::vector<Foo> hugeVec;
hugeVec.reserve(1000);    // enough for 1000 foo's

// add stuff

std::vector<Foo> finalVec;
finalVec = hugeVec;

2) Alternatively, when your vector is full call reserve with enough for another set of objects;

if (vec.capacity() == vec.size())
  vec.reserve(vec.size() + 16);  // alloc space for 16 more objects

You could choose a different container that did not result in all elements being copied upon a resize, but your bottleneck may then become the individual memory allocations for the new elements.

Andrew Grant
You're much better off increasing the reserve exponentially than linearly (vec.reserve(vec.size() * 2)), it will give you much better average performance.
Eclipse
The problem with exponential growth is that it becomes awfully big awfully fast :) Not knowing how many items are involved, or the size of each item, I went for a safer example but yes, exponential could well be a better choice.
Andrew Grant
I think you'll be better off calling finalVec.swap(hugeVec) to avoid the unnecessary copies in the assignment operator.
Idan K
about exponential growth: it's merely an heuristic of what the app is going to need. And a very good one. Since push is the bottleneck, that means _loads_ of extension, if the memory can't take it, you need more memory.
xtofl
A great Idea, though, to split 'build' and 'finalize'.
xtofl
A: 

If you want vector to be fast, you must reserve() enough space. It makes a huge difference, because each grow is terrible expensive. If you dont know, make a good guess.

Sanjaya R
+1  A: 

"push_back()" can be slow if the copy of an object is slow. If the default constructor is fast and you have a way tu use swap to avoid the copy, you could have a much faster program.

void test_vector1()
{
    vector<vector<int> > vvi;
    for(size_t i=0; i<100; i++)
    {
        vector<int> vi(100000, 5);
        vvi.push_back(vi);    // copy of a large object
    }
}

void test_vector2()
{
    vector<int> vi0;
    vector<vector<int> > vvi;
    for(size_t i=0; i<100; i++)
    {
        vector<int> vi(100000, 5);
        vvi.push_back(vi0);  // copy of a small object
        vvi.back().swap(vi); // swap is fast
    }
}

Results :

VS2005-debug 
* test_vector1 -> 297
* test_vector2 -> 172

VS2005-release
* test_vector1 -> 203
* test_vector2 -> 94

gcc
* test_vector1 -> 343
* test_vector2 -> 188

gcc -O2
* test_vector1 -> 250
* test_vector2 -> 156
Rexxar
In test_vector2 just do vvi.resize(100) before loop and remove vvi.push_back(vi0). I bet you'll get a considerable speedup.
Constantin
@Constantin : I have used "push_back" because Vidya says that the size of his vector is unknown in his program.
Rexxar
A: 

You have to choose your container according to what you're going to do with it.

Relevant actions are: extending (with push), insertion (may not be needed at all), extraction, deletion.

At cplusplus.com, there is a very nice overview of the operations per container type.

If the operation is push-bound, it makes sense that the vector beats all others. The good thing about deque is that it allocates fixed chunks, so will make more efficient use of fragmented memory.

xtofl