views:

358

answers:

8

I am thinking of how I can implement std::vector from the ground up.

How does it resize the vector?

realloc only seems to work for plain old stucts, or am I wrong?

+1  A: 

It allocates a new array and copies everything over. So, expanding it is quite inefficient if you have to do it often. Use reserve() if you have to use push_back().

Marcus Lindblom
to be picky, it copy constructs everything over =)
Viktor Sehr
+13  A: 

it is a simple templated class which wraps a native array. It does not use malloc/realloc. Instead it uses the passed allocator (which by default is std::allocator).

Resizing is done by allocating a new array and copy constructing each element in the new array from the old one (this way it is safe for non-POD objects). To avoid frequent allocations, often they follow a non-linear growth pattern.

In addition to this, it will need to store the current "size" and "capacity". Size being how many elements are actually in the vector. Capacity is how many could be in the vector.

So as a starting point a vector will need to look somewhat like this:

template <class T, class A = std::allocator<T> >
class vector {
public:
    // public member functions
private:
    T* data_;
    typename A::size_type capacity_;
    typename A::size_type size_;
    A allocator_;
};

The other common implementation is to store pointers to the different parts of the array. This cheapens the cost of end() (which no longer needs an addition) ever so slightly at the expense of a marginally more expensive size() call (which now needs a subtraction). In which case it could look like this:

template <class T, class A = std::allocator<T> >
class vector {
public:
    // public member functions
private:
    T* data_;         // points to first element
    T* end_capacity_; // points to one past internal storage
    T* end_;          // points to one past last element
    A allocator_;
};

I believe gcc's libstdc++ does this, both approaches are equally valid and conforming.

Of course, you could also use the PIMPL idiom to make swap much simpler at the cost of an extra indirection during access. But that's a matter of preference.

Evan Teran
So does this mean that at resize time there is temporarily a doubling of memory allocated?
John Smith
yes, during a resize, there is a period where the new memory has been allocated but the old one has not been deallocated yet.
Evan Teran
A: 

realloc only works on heap memory. In C++ you usually want to use the free store.

Noah Roberts
"free store"? what do you mean by that?
Jacek Ławrynowicz
http://www.gotw.ca/gotw/009.htm
Noah Roberts
FWIW, there's nothing that says you can't make the free store using the heap.
David Thornley
+1  A: 

From Wikipedia, as good an answer as any.

A typical vector implementation consists, internally, of a pointer to a dynamically allocated array,[2] and possibly data members holding the capacity and size of the vector. The size of the vector refers to the actual number of elements, while the capacity refers to the size of the internal array. When new elements are inserted, if the new size of the vector becomes larger than its capacity, reallocation occurs.[2][4] This typically causes the vector to allocate a new region of storage, move the previously held elements to the new region of storage, and free the old region. Because the addresses of the elements change during this process, any references or iterators to elements in the vector become invalidated.[5] Using an invalidated reference causes undefined behaviour

Serapth
+3  A: 

The reimplementation of vector as an exercise is covered in detail in Accelerated C++, a book you should probably read in any case.

anon
+2  A: 

Resizing the vector requires allocating a new chunk of space, and copying the existing data to the new space (thus, the requirement that items placed into a vector can be copied).

Note that it does not use new [] either -- it uses the allocator that's passed, but that's required to allocate raw memory, not an array of objects like new [] does. You then need to use placement new to construct objects in place. [Edit: well, you could technically use new char[size], and use that as raw memory, but I can't quite imagine anybody writing an allocator like that.]

When the current allocation is exhausted and a new block of memory needs to be allocated, the size must be increased by a constant factor compared to the old size to meet the requirement for amortized constant complexity for push_back. Though many web sites (and such) call this doubling the size, a factor around 1.5 to 1.6 usually works better. In particular, this generally improves chances of re-using freed blocks for future allocations.

Jerry Coffin
It should be noted that `realloc` _may_ have advanced support from OS in that it is not just a plain `malloc/free` (e.g.g see `HeapReAlloc` on Windows), and a conformant `vector` implementation can use e.g. type traits to detect if the type is POD, and use `malloc/realloc/free` rather than `new/delete` for that case.
Pavel Minaev
A: 

You'd need to define what you mean by "plain old structs."

realloc by itself only creates a block of uninitialized memory. It does no object allocation. For C structs, this suffices, but for C++ it does not.

That's not to say you couldn't use realloc. But if you were to use it (note you wouldn't be reimplementing std::vector exactly in this case!), you'd need to:

  1. Make sure you're consistently using malloc/realloc/free throughout your class.
  2. Use "placement new" to initialize objects in your memory chunk.
  3. Explicitly call destructors to clean up objects before freeing your memory chunk.

This is actually pretty close to what vector does in my implementation (GCC/glib), except it uses the C++ low-level routines ::operator new and ::operator delete to do the raw memory management instead of malloc and free, rewrites the realloc routine using these primitives, and delegates all of this behavior to an allocator object that can be replaced with a custom implementation.

Since vector is a template, you actually should have its source to look at if you want a reference – if you can get past the preponderance of underscores, it shouldn't be too hard to read. If you're on a Unix box using GCC, try looking for /usr/include/c++/version/vector or thereabouts.

Owen S.