ansaurus

Question

how is push_back implemented in STL vector?

Answer 1

A:

Thanks to some comments, I am completely revising a very incorrect original answer.

According to the STL spec, your answer was correct. The vector is implemented as a dynamically resized array:

Vector containers are implemented as dynamic arrays; Just as regular arrays, vector containers have their elements stored in contiguous storage locations, which means that their elements can be accessed not only using iterators but also using offsets on regular pointers to elements.

But unlike regular arrays, storage in vectors is handled automatically, allowing it to be expanded and contracted as needed.

Heath Hunnicutt 2010-04-12 20:09:21

No. Absolutely not. A vector is a dynamic array -- the standard requires that the underlying storage be the exact same as a builtin array. The standard requires only O(1) *amortized* time. Therefore, it need only be O(1) in the average case.

Billy ONeal 2010-04-12 20:11:18

The underlying buffer of a vector must be contiguous, meaning it cannot be a linked list. The push_back() method can still run in constant time (on average) if resizing doesn't occur during each call. This typically requires that the buffer increase in size by some factor (e.g. double in size whenever the vector is full).

Void 2010-04-12 20:13:39

Answer 2

+2 A:

Possibly what they were looking for is that push_back makes a copy of the object being pushed onto the vector (using its copy constructor).

With regard to resizing: The standard says a.push_back(x) is equivalent to a.insert(a.end(),x). The definition of insert says, in part: “Causes reallocation if the new size is greater than the old capacity.”

The standard says what the functions are supposed to do. But how they’re implemented is, in most cases, implementation-specific.

Nate 2010-04-12 20:12:54

Answer 3

A:

vector doesn't use a linked list. It uses continuous memory.

If there is not enough reserved space push_back allocates a new chunk of memory twice as large as the original vector. By doing that the amortized runtime is O(1).

Axel Gneiting 2010-04-12 20:13:05

...a new chunk that's larger than the old by some constant factor anyway -- but the factor is typically around 1.5 rather than 2.

Jerry Coffin 2010-04-12 20:16:37

@Jerry, strange. For a factor of 1.5 two copies for each elements on average are needed, whereas for the factor of 2 it's just 1 copy.

Pavel Shved 2010-04-12 20:24:57

@Pavel: Andrew Koenig wrote an article years ago about one reason to favor a smaller factor. With a factor of two, the sum of chunks you've discarded will always be smaller than your next larger allocation, so you never get to reuse them for the same container. As long as the factor is <=the golden mean, they'll eventually add up to a large enough chunk to re-use for the next larger allocation.

Jerry Coffin 2010-04-12 21:26:41

@Coffin, this is the case only if the underlying memory manager allocates the chunks thriftily. However, the manager may stick to 2^n chunks which renders the 1.5-scheme useless.

Pavel Shved 2010-04-13 20:01:57

Answer 4

+10 A:

An STL vector has a size (current number of stored elements) and a capacity (currently allocated storage space).

If size < capacity, a push_back simply puts the new element at the end and increments the size by 1.
If size == capacity before the push_back, a new, larger array is allocated (twice the size is common, but this is implementation-dependent afaik), all of the current data is copied over (including the new element), and the old allocated space is freed. This may throw an exception if the allocation fails.

The complexity of the operation is amortized O(1), which means during a push_back that causes a resize, it won't be a constant-time operation (but in general over many operations, it is).

tzaman 2010-04-12 20:13:05

I want also to comment that the amortized cost O(1) is the case if the new capacity is `k` times bigger than the old one. It's also of note that, for a given `k`, each element is copied `1/(k-1)` times on average (for `k=2` it's just one additional copy).

Pavel Shved 2010-04-12 20:22:05

+1 for the amortization insight.

andand 2010-04-12 21:11:15

Also, `2` is not the most common factor. Most implementations have shifted to the golden ratio because it plays nicer with memory... this had been discussed here on SO and a comp.lang.c++.moderated thread was referenced.

Matthieu M. 2010-04-13 07:39:08

Ah, didn't know that. Thanks for the info; can you provide links to the discussions you mention?

tzaman 2010-04-13 08:24:00

Answer 5

+1 A:

That, of course, is inherently implementation defined. Assuming it's a question of how somebody would implement a dynamic array, in general, it'd be something like this:

push_back checks capacity() and ensures it's at least one larger than size().
If there is no capacity for the new element, the vector reallocates it's entire underlying storage, copying over the contents of the old storage to the new storage. The old storage is deallocated.
The new element is copied to the end of the dynamic array.

Some STL implementations will elide some of the copies by using swaps (i.e. for containers of containers), but for the most part that's exactly how it works.

Billy ONeal 2010-04-12 20:13:34

Answer 6

A:

How much detail did the interviewer want? For example, was he looking for you to drill down into the lower level details?

Besides the usual resize-as-needed to retain the O(1) on average semantics, some things to consider include but are not limited to:

Exception safety: Does the implementation provide a guarantee that state of the vector will not be modified if an exception is thrown when appending the new element (e.g. rollback semantics)? For example, an exception could be thrown during allocation, copying, insertion, etc.
Proper use of allocator: Does the implementation correctly use the vector's allocator instead of plain old new based allocator (both may or may not be the same)? Ideally, this will be handled transparently by the vector's resizing code, implementations could certainly differ, however.

Void 2010-04-12 20:30:14

Answer 7

+5 A:

template< typename T >
void std::vector<T>::push_back(const T& obj)
{
    this->insert(this->end(),obj);
}

sbi 2010-04-12 20:30:36

You mean `void std::vector<T>::push_back` ?

Seth Johnson 2010-04-12 20:47:53

@Seth: Thanks. I knew I would spoil it...

sbi 2010-04-12 20:48:52

Perfect response to a bad question. Technically correct without answering what they *meant* to ask.

Graphics Noob 2010-04-12 22:49:31

@Graphics: Maybe I'm dense, but I'm not sure what else was expected. I think asked "How is push_back implemented in `std::vector`?" I would have given this answer in the interview.

sbi 2010-04-14 22:37:03

@sbi I would assume they're looking for an answer involving memory management (like tzaman's answer).

Graphics Noob 2010-04-14 23:45:57

@Graphics: I had interviews like that. They asked a question and I pointed out the error in it. Didn't wok out.

sbi 2010-04-15 00:38:23

Answer 8

A:

Noah Roberts 2010-04-12 20:45:48

This seems like a fairly expensive implementation. You first copy the contents of the vector in reverse, which requires allocation + O(n) iteration to copy of each element in the vector. Then you push the parameter to the *beginning* of the temporary vector, which requires yet another allocation and O(n) accompanying element copy operations. Then you create another copy of the temporary (allocation + O(n) reverse iteration/element copies) that is copy assigned to the vector being operated on. That assignment will result in deallocation + O(n) element destruction operations. Kinda slow. :)

Void 2010-04-12 21:20:01

ansaurus

tags:

views:

answers:

how is push_back implemented in STL vector?

related questions