tags:

views:

817

answers:

10

Hello C++ Gurus,

I have been using stl vector a lot, and recently I asked this question to myself,,, "How is stl vector implemented"

I had two options:-
1) Linked list:- and then making the api to feel like random access (overloading [])

2) using new for e.g. DATA *temp = new DATA[20]:- I believe they do sth like this but then it raise one more question, do they always allocate a maximum (u_int32_t) storage to give random access(this is inefficient in terms of memory).

3) Or is there sth else that I should be aware of.

Thanks for taking your time and clearing my doubt,, Thanks Arun

+14  A: 

It's implemented by using an underlying array.

It's not possible to implement a vector<T> with a linked list because the standard guarantees the elements in the list will be held in contiguous memory.

JaredPar
Also, lookup time is important.
Ed Swangren
Contiguous memory is only required by the upcoming C++ standard. But all available implementation today use an array anyway.This is an indirect requirement by the O(1) access time.
Axel Gneiting
From the current (2003) standard: "The elements of a vector are stored contiguously" (23.2.4/1).
James McNellis
@Axel: No, it is already required by C++03. It was not required by C++98, but it was always intended to be contiguous, so C++03 made it explicit.
jalf
Okay seems that I mixed that one up. I'm sorry.
Axel Gneiting
@Downvoter, care to explain?
JaredPar
+2  A: 

There is no one way it is implemented. Different implementations can be different, so long as the preserve the semantics and satisfy the requirements.

At any given time, there has to be a primitive array of T to satisfy the requirements of contiguity. However, how it is allocated, grown, shrunk, and freed is up to the implementor.

You can read the implementation for yourself, it's right there in the header file.

I can tell you that no implementations use linked lists. They aren't consistent with the requirements of the standard.

bmargulies
I will assert unequivocally there is no implementation that is a linked list.
Jason
Part of the required semantics are O(1) random access. Linked list isn't a possibility.
jamesdlin
Potatoswatter
There is a lot of room for difference in how the array gets allocated, freed, and all that other stuff the OP asked about.
bmargulies
@bmargulies: There is a requirement for amortized constant time on push_back() operations, which requires exponential growth of the underlying array. The only wiggle room is what that growth factor is. I think 2 is most common, but 1.5 has been shown to be optimal IIRC.
Drew Hall
It gets allocated and freed by the vector's allocator, where the default allocator simply calls `operator new()`, `operator delete()`, and the class' ctor/dtor. Name one potential difference between two implementations besides the resize-step formula.
Potatoswatter
@Potatoswatter: It's a matter of emphasis. I chose to emphasize the fact that there is abstraction going on here, however minimal, and that the only way to have certain knowledge is to read the implementation. Obviously, my choice was not especially popular. That's why we got votes.
bmargulies
+2  A: 

I believe the STL uses option #2 (or something similar) because a std::vector<> is guaranteed to store the elements in contiguous memory.

If you're looking for a memory structure that doesn't need to use contiguous memory, look at std::deque.

ceretullis
Yes I know that, but I was wondering how vectors are implemented..... thanks
Arun
+7  A: 

They use a dynamically allocated array that is regrown as needed. It is necessary to use something like an array so that the elements are contiguous in memory which is guaranteed by the standard.

Incidentally, one common way of regrowing an array is to double the size as needed. This is so that if you are inserting n items at most only O(log n) regrowths are performed and at most O(n) space is wasted.

You can read one implementation for yourself at SGI (where STL was originally conceived).

Jason
A: 

Ok I now have an Idea,, For example what I can do is do sth like Data * temp = new Data(max);

if due to push_back if the total length increase by max,, I can create a new element,, for e.g.

max = 2*max;

Data *temp1 = new Data(max);

and copy the content of the previous vector.

This poses a new questions,, if there is more data then more resize and more inefficient.

Arun

Arun
you can reserve vector storage if you are dealing with large datasets, this will eliminate growth overhead. also, you probably want to use realloc if you want to implement yourself.
aaa
This should be a seperate question or a comment within your original question, not a response to your own question.
John Dibling
+3  A: 

I believe it is the third option. It can't just use new T[n] because then it would actually have to construct as many objects as it allocates. E.g

std::vector<Foo> v;
v.reserve(10);

If your implementation simply ended up doing new Foo[10] then you'd just have constructed 10 instances of Foo.

Instead it uses its allocator to allocate and deallocate raw memory (without constructing objects), and as needed (for example, when you actually push_back objects) places copy-constructed instances into correct memory locations in its reserve using placement new and removes them with explicit calls to the destructor (something you'd only do in combination with placement new). The allocator class provides following methods for that which I presume vector's implementations use

 void construct(pointer p, const_reference val);

  Returns:
    new((void *)p) T(val)

  void destroy(pointer p);

  Returns:
    ((T*)p)->~T()

(The "returns" probably should read "effect" or similar.)

More about placement new

UncleBens
+2  A: 

Section 23.2.4, ¶1 of the standard requires that arithmetic on pointers into a vector work the same as with pointers into an array.

The elements of a vector are stored contiguously, meaning that if v is a vector where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().

This guarantees that the storage is in an array. Of course, if you resize the array to be bigger, it might get moved in memory.

Potatoswatter
+1  A: 

There's no actual array at all in any decent implementation (if there is, you can't use any object in it without a default constructor), but just raw memory that gets allocated. It gets allocated in a manner that's usually along the lines of doubling every time you need to expand it.

The vector then uses in place allocation to call the constructors of the class in the proper location once each slot actually gets used actually used.

When there is expansion it will try to reallocate in place (but this is a bit silly and doesn't normally work, think windows 98 heap compaction) but usually will end up making a whole new allocation and copying over.

A standard stl vector is always all together, but not all implementations work like that (I know, having written some of them). Probably none are exactly a linked list, though, either.

Charles Eli Cheese
More or less, but where are you taking the in-place reallocation from?
UncleBens
A: 

From what i have read in books and from the functionality of reserve and and the requirement that elements of vectors be contiguous, This is what i think could be a possible way to implement Vector.

1) Elements of vectors be contiguous , supporting O(1) random access and vectors should be compatible with C arrays. This just implies there are no linked lists.

2) When you call reserve it reserves additional memory. But reserve does call

new T[newSize]

to reserve more memory. Otherwise it will call default constructor. As uncleben explained whenever reserve is called the vector class just allocates more uninitialized memory usin its allocator (if required) and copy construct new objects into that memory using placement new(if more memory has been allocated)

3) Initially vector has some default capacity. for which uninitialized memory is allocated when the vector object is constructed

4) push_back copy constructs the object into the first available location. If required more memory has to be allocated in similar manner as reserve

Yogesh Arora
+1  A: 

A pedagogic (and thus simplified) version of a container called "Vec" is discussed in Chapter 11 of the wonderful (introductory) book "Accelerated C++". What they describe is a stripped-down version of std::vector, but I think it is still worth noting that:

1) they implement their template class in terms of an array,

2) they discuss push_back in terms of the trick (mentioned above) of allocating more storage than is needed, and coming back for more when they run out, and

3) they use allocator<T> for memory management. The new operator is not flexible enough in this context, since it both allocates and initializes memory.

I repeat, though, that this doesn't mean that actual implementations out there are this simple. But since "Accelerated C++" is quite widespread, those interested can find in the relevant chapter one way vector-like objects can be created, copied, assigned, and destroyed.

EDIT: On a related note, I just found the following blog post by Herb Sutter in which he comments on an earlier blog post by Andrew Koenig, regarding whether or not one should be worried about vector elements being contiguous in memory: Cringe not: Vectors are guaranteed to be contiguous.

Alexandros Gezerlis