views:

129

answers:

3

...well, I got strange results!

I was curious about the performance of std::vector vs. that of a dynamic array. Seeing as there are many questions on this subject already, I wouldn't have mentioned it if I didn't constantly get these 'contradictory' results: vector<int> is somehow faster than a new int[]! I always thought that if there was any performance difference, it would always favor the dynamic array. How is this result possible?

The code is below:

int numElements = 10000000;
 long j = 0;
 long k = 0;

 vector<int> intVector(numElements);
 int* intArray = new int[numElements]; 

 clock_t start, finish;
 start = clock();

 for (int i = 0; i < numElements; ++i)
  intVector[i] = i;
 for (int i = 0; i < numElements; ++i)
  j += intVector[i];

 finish = clock();
 cout << "j: " << j << endl;
 cout << "Total duration: " << (double) finish - start << " ms." << endl;

 // Test Control.
 start = clock();

 for (int i = 0; i < numElements; ++i)
  intArray[i] = i;
 for (int i = 0; i < numElements; ++i)
  k += intArray[i];

 finish = clock();
 cout << "k: " << k << endl;
 cout << "Total duration: " << (double) finish - start << " ms." << endl;

Optimizations were on, and I separated the for loops within each start/finish block so that I could separately time the initializations of the array/vector (in that case, std::vector<int> and new int[] appear to perform identically).

However, with the above code I constantly get std::vector<int> winning at 26-30 ms versus 36-45 ms for the new int[].

Anyone care to explain why the vector is performing better than the dynamic array? Both were declared before the timing loops so I expected performance to be about the same. Furthermore, I tried the same idea instead using std::vector<int*> and new int*[] and got similar results, with the vector class outperforming the dynamic array, so the same holds for pointers to pointers.

Thanks for the help.

Addendum: Without optimization, std::vector loses out big time to a dynamic array (~1,400 ms vs. ~80 ms), to give the expected performance difference, but doesn't this imply that the vector class can somehow be optimized to give better performance than a standard dynamic array?

+2  A: 

My wild guess is that the OS isn't allocating physical memory until it's first accessed. The vector constructor will initialise all the elements, so the memory will be allocated by the time you've started timing. The array memory is uninitialised (and possibly unallocated), so the time for that might include the allocation.

Try changing the array initialisation to int* intArray = new int[numElements](); to value-initialise its elements, and see if that changes the results.

Mike Seymour
Good point. Do you think the page fault occuring when accessing the memory for the first time can explain the difference ? I don't know about Windows, but on Linux the memory is effectively allocated when first access is made.
Alexandre C.
And your wild guess was perfectly on target! Those couple of parenthesis now give the exact same performance. Good catch.
Kristian D'Amato
@Alexandre: yes, if the physical memory isn't allocated then the first access will cause a page fault. The OS will handle this by allocating pages (several thousand, depending on the platform's page size); it may also have to initialise them with zero. This is going to take a fair amount of time. I also don't know about Windows, but I've a vague idea that it does more or less the same sort of thing.
Mike Seymour
+4  A: 

For all practical purposes, they're the exact same speed when used this way. vector's operator[] is typically implemented like this [MSVC version]:

const_reference operator[](size_type _Pos) const
{   // subscript nonmutable sequence
    return (*(_Myfirst + _Pos));
}

... which is the same as:

const_reference operator[](size_type _Pos) const
{   // subscript nonmutable sequence
    return _Myfirst[_Pos];
}

Your test is basically just testing your compiler's ability to inline code, and it appears to be doing it nicely here.

As for the explanation of the differences, any answers you get are generally going to be hypothetical without seeing the disassembly. It could have to do with better caching, registers utilized better for the first case (try swapping the order of the tests and see what happens), etc. One thing worth noting is that the vector's memory will be accessed before the test starts with the way it initializes everything to T() in the ctor.

Unfortunately we can't simply write little micro-tests like these and make general conclusions from them. We used to be able to do this more before systems and optimizing compilers became so complicated, but now there are far too many variables involved to make meaningful conclusions from anything but real-world tests.

It's for this same reason that we generally expect anyone who is serious about performance to actively profile their code, as things have become far too complicated for people to correctly determine the bottlenecks in their code short of obvious algorithmic inefficiencies (I've often seen even expert programmers who have a far superior understanding of assembly and computer architecture than I do get this wrong when I check their hypotheses with the profiler).

Amen to the last paragraph. Artificial testing is nearly useless. Find a real problem, benchmark and change things until the benchmark improves.
Adam Shiemke
+1  A: 

I just did this experiment. Strange behavior indeed, although I think I figured it out.

Repeat your code again. That is...

benchmark vector
benchmark array

benchmark vector
benchmark array

You'll notice that you'll get different numbers the second time. My guess? Page Faults. The vector for some reason doesn't cause a page fault, while the array method does. After the pages are loaded, both will run at approximately the same speed (ie: what happens the 2nd time). Does anyone know how to print the number of page faults in a process so far?

Dragontamer5788