views:

846

answers:

4

I have got here two programs with me, both are doing exactly the same task. They are just setting an boolean array / vector to the value true. The program using vector takes 27 seconds to run whereas the program involving array with 5 times greater size takes less than 1 s. I would like to know the exact reason as to why there is such a major difference ? Are vectors really that inefficient ?

Program using vectors

#include <iostream>
#include <vector>
#include <ctime>

using namespace std;

int main(){
 const int size = 2000;
 time_t start, end;
 time(&start);
 vector<bool> v(size);
 for(int i = 0; i < size; i++){
  for(int j = 0; j < size; j++){
   v[i] = true;
  }
 }
 time(&end);
 cout<<difftime(end, start)<<" seconds."<<endl;
}

Runtime - 27 seconds

Program using Array

#include <iostream>
#include <ctime>

using namespace std;

int main(){
 const int size = 10000; // 5 times more size
 time_t start, end;
 time(&start);
 bool v[size];
 for(int i = 0; i < size; i++){
  for(int j = 0; j < size; j++){
   v[i] = true;
  }
 }
 time(&end);
 cout<<difftime(end, start)<<" seconds."<<endl;
}

Runtime - < 1 seconds

Platform - Visual Studio 2008 OS - Windows Vista 32 bit SP 1 Processor Intel(R) Pentium(R) Dual CPU T2370 @ 1.73GHz Memory (RAM) 1.00 GB

Thanks

Amare

+31  A: 

You're using std::vector of bool and that's not what you think it is!

vector of bool is a bastard child template specialization that should never have existed and actually stores 1 bool in each bit. Accesses into it are more complicated due to masking and shifting logic, so will definitely be somewhat slower.

Click here for some info on vector of bool.

Also, you may be running an unoptimized build (almost certainly given the times you listed, 27 seconds is outrageous for 4 million iterations). The Standard Template Library relies very heavily on the optimizer to do things like inline function calls and elide temporaries. Lack of this optimization causes an especially heavy performance degradation for vector of bool because it has to return a proxy object when you index into it, because you can't take the address of a bit, so operator [] can't return a reference.

Click here for more info on proxied containers (The last half is about vector of bool)

In addition many STL implementations have helpful debugging bits that are not part of the standard which help you catch bugs, but really drag performance down. You'll want to make sure those are disabled in your optimized build.

Once you turn on the optimizer, have the right settings (i.e. no STL debugging turned on), and are actually testing the same thing in both loops, you will see almost no difference.

I had to make your loop much larger to test on my machine, but here's two builds of your vector of bool loop on my machine, showing the impact of optimizer flags on STL code

$ g++ main.cpp 
$ ./a.out 
17 seconds.
$ g++ -O2 main.cpp 
$ ./a.out 
1 seconds.
Don Neufeld
Ya i also think so. I run the same scenario and it took almost same time.
Vivek
VC2005+ in particular has bound checking and iterator validation checks for debug builds for all STL objects.
Pavel Minaev
Thanks don.neufeld, your explanation as well as the link are really helpful. Good to learn something new :-)
You should also try out a `std::deque<bool>`, it does not suffer from the template specialization error that was made with the vector.
Matthieu M.
+2  A: 

Have a look at this

Using arrays or std::vectors in C++, what’s the performance gap?

astander
+1  A: 

Other answers are very good, but you could easily answer it yourself by this method.

ADDED: In response to comments, let me show you what I mean. I'm running VC On Windows, but this works on any language/OS. I took your first program and increased size to 20000 so it would run long enough. Then while it was running, I took several stackshots. They all look like this:

std::vector<bool,std::allocator<bool> >::begin() line 93 + 25 bytes
std::vector<bool,std::allocator<bool> >::operator[]() line 132 + 37 bytes
main() line 24 + 12 bytes
mainCRTStartup() line 206 + 25 bytes
KERNEL32! 7c817077()

So what that says is that it is spending essentially all of it's time in the indexing operation on line 24, and the reason it's spending that time is that the [] operator is calling the begin operator. In more detail:

main() line 24 + 12 bytes

is this code:

for(int j = 0; j < size; j++){
==> v[i] = true;
}

that calls:

std::vector<bool,std::allocator<bool> >::operator[]() line 132 + 37 bytes

which is this code (that I reformatted a little bit):

reference operator[](size_type _P){
==> return (*(begin() + _P));
}

that calls:

std::vector<bool,std::allocator<bool> >::begin() line 93 + 25 bytes

which is doing this (in more detail):

92:       iterator begin()
93:           {return (_First); }
00402890   push        ebp
00402891   mov         ebp,esp
00402893   sub         esp,44h
00402896   push        ebx
00402897   push        esi
00402898   push        edi
00402899   push        ecx
0040289A   lea         edi,[ebp-44h]
0040289D   mov         ecx,11h
004028A2   mov         eax,0CCCCCCCCh
004028A7   rep stos    dword ptr [edi]
004028A9   pop         ecx    <===============
004028AA   mov         dword ptr [ebp-4],ecx
004028AD   mov         eax,dword ptr [ebp-4]
004028B0   mov         eax,dword ptr [eax+4]
004028B3   pop         edi
004028B4   pop         esi
004028B5   pop         ebx
004028B6   mov         esp,ebp
004028B8   pop         ebp
004028B9   ret

What it is doing is writing 68 bytes of 0xCC on the stack (for some debug reason) as part of getting the begin address of the vector, as part of computing the address of v[i], prior to doing the assignment.

The fraction of time it spends doing this is near 100%, because it was doing it on every one of the several samples that were taken. Could you have guessed that that's what it was spending nearly all of its time doing? I couldn't have.

This is, of course, a Debug build. If you switch to a Release build, but turn on debugging information, all these functions get inlined and optimized away, so it goes like 30 times faster, and again the stackshots tell exactly what it's doing.

So - people can tell you what it might be doing, but this shows how to find out for yourself what it is really doing.

On your environment it will undoubtedly be different.

Mike Dunlavey
yes indeed. Rather than understanding the properties of standard library datastructures, point him to information on how to profile your code **in another OS than he's actually using**. And if you've ever tried to debug or profile or otherwise read standard library containers, you'll know that it's not exactly easily readable. Profiling might tell you which lines of code causes the slowdown, but it may not answer the question of *what's going on*.
jalf
@jalf: Come on. It's **independent of OS**, and profiling as it's commonly understood may not tell you what's going on, but stackshots will tell you *exactly what's going on* as long as there's source code of the libraries.
Mike Dunlavey
... It's the old saw about giving someone a fish versus teaching them to fish.
Mike Dunlavey
+1  A: 

std::vector<bool> is optimized for memory consumption rather than performance.

You may trick it by using std::vector<int>. You should not have performance drawbacks then.

Pavel Radzivilovsky
Fixed your post to use code formatting. The angle brackets disappeared without it
jalf
Instead of `vector<int>` I'd suggest `vector<char>` (or `unsigned char`, or if the compiler supports it `std::uint8_t`). No reason to use more space than you need. But definitely not `vector<bool>`.
AFoglia
A reason to use more space is better speed on most 32-bit architectures.
Pavel Radzivilovsky
The speed difference between `int` and `char` probably isn't worth using 4x more memory for (especially since cache misses hurt much more).
Brendan Long