views:

255

answers:

3

I came across this as I was trying to learn array and vectors in c++. What is the "paging effect" mentioned in the post? Also, just to check my own understanding, I think vector uses more time is because of the dynamic memory allocation. Am I right?


additional question: but with vector<int> arr( 10000 ) isn't there already enough memory for 10000 int allocated? or put it this way, does arr still grow if all I do is iterating through each element and initializing them?

+5  A: 

Vector uses dynamic allocation if you use push_back(), but you can force it to pre-allocate memory with reserve().

Checked builds (common in debug libraries) also check the bounds for vector operations which can slow them down in debug mode. Release builds should be no slower than raw C.

Paging means moving memory out to disk when physical memory is full. You have to be careful with timing if you think memory is being paged out. A common technique is to run the task multiple times and reject the longest times.

edit: You should (almost) never use the raw 'C' type instead of the STL for efficiency. The people that wrote the STL are both really smart and really care about performance. Used properly it should never be worse than 'C' and is often better. The same goes double for using STL algorithms rather than your own han rolled loops.

Martin Beckett
Vectors will be slower than raw arrays because as they grow they have to re-allocate and move the old data. You can avoid this by using .reserve() (or .resize) up front to tell them how large to be.
Tom Leys
He mentioned reserve() already. Additionally, when using the same approach with raw arrays as with vector you also have to re-allocate and copy.
Georg Fritzsche
@mgb A STL vector will not always incur dynamic allocations when inserting elements. Most (all?) implementations will double the capacity when growing to allow for linear time insertion on average. Specifically, no allocation will occur within the vector itself when capacity() is greater than size(). The object being inserted could of course incur its own allocation when being copied into the vector.
Void
Actually, recent research has proven that a growth factor of 2 isn't actually that good, and the STL requirements are in fact met for any exponential growth factor. It turns out that 1.5 is a lot better, yet easy to implement (`m_capacity += m_capacity >> 1`, fairly fast)
MSalters
It won't incur a realloc for every push_back but if you don't reserve() then it will have at least one !
Martin Beckett
"The people that wrote the STL are both really smart and really care about performance. Used properly it should never be worse than 'C' and is often better." This has not been my experience at all -- STL's algorithms have always done much worse for me than custom-built containers of native types.
Crashworks
+1  A: 

The page you linked to points out that optimisation removes the performance difference. This means it is most likely caused by extra function calls in vector - you can safely ignore these because the optimiser is smart enough to inline them.

The poster is using the name "paging effect" but what they are actually referring to in the vector case is the cost of memory allocation. Also, by trying to write / read to that memory, they are pulling a segment at the end of the arrays into the cache, perhaps improving future performance in that memory area.

Tom Leys
A: 

The author of that code is forcing

array[size - 1] = 0;

access to make sure that at least one access to the array buffer has been made prior to running the main part of his code. This is done to mitigate effects of using virtual memory swap file - that access increases the probability of the buffer not being swapped out to the paging file when the main code starts. Without this it could happen that when the first access to the buffer was made swapping would occur and that would increase the runtime significantly and lead to false meausurement results.

This will only have guaranteed effect if the buffer size is not greater than the size of the minimum memory segment the operating system uses for addressing virtual memory.

This is not a problem specific to C++.

Also vector is slower because accessing it requires two memory accesses - one to retrieve the buffer address, another one to actually access memory. With C array the buffer address is already known and only one access is needed. If the same code with vector is rewritten using a temporary for storing the buffer address it runs much faster.

sharptooth
A downvote is great! But where's the reasoning?
sharptooth
(Not my downvote) The "two memory accesses" statement is plain wrong, as is the conclusion that follows about the speed of C++ relative to C (Whether you have a `int*` or a `vector<int>`, if it's on the stack you need to load the base address into a register). Besides wrong, it's also irrelevant to the question.
MSalters
VC++ emits "two memory accesses" code for vector in such cases.
sharptooth
I think his point is that it'd be two memory accesses even if it had been a raw pointer. Of course, it could be avoided if the entire array was on the stack, but a dynamically allocated array would suffer the extra memory access just like a vector
jalf
Remove the last paragraph (...requires two memory accesses...) for an upvote.
Tom Leys