views:

422

answers:

8

I've been trying to convince a friend of mine to avoid using dynamically allocated arrays and start moving over to the STL vectors. I sent him some sample code to show a couple things that could be done with STL and functors/generators:

#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>

#define EVENTS 10000000

struct random_double {
  double operator() () { return (double)rand()/RAND_MAX; }
};  

int main(int argc, char **argv){

  std::vector<double> vd (EVENTS);

  generate(vd.begin(), vd.end(), random_double());
  copy(vd.begin(), vd.end(), std::ostream_iterator<double>(std::cout, "\n"));

  return 0;
}

His reply to this, although he feels it's more elegant, is that his own code is faster (by almost a factor of 2!) Here's the C code he replied with:

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <string.h>

#define EVENTS 10000000

__inline double random_double() {
  return (double)rand()/RAND_MAX;
}


int main(int argc, char **argv){
  unsigned int i;
  double *vd;
  vd = (double *) malloc(EVENTS*sizeof(double));

  for(i=0;i<EVENTS;i++){ vd[i]=random_double(); }

  for(i=0;i<EVENTS;i++){ printf("%lf\n",vd[i]); }

  free(vd);

  return 0;
}

So I ran the simple timing test to see just what happens, and here's what I got:

> time ./c++test > /dev/null
real    0m14.665s
user    0m14.577s
sys     0m0.092s

> time ./ctest > /dev/null
real    0m8.070s
user    0m8.001s
sys     0m0.072s

The compiler options, using g++ were: g++ -finline -funroll-loops. Nothing too special. Can anyone tell me why the C++/STL version is slower in this case? Where is the bottleneck, and will I ever be able to sell my friend on using STL containers?

A: 

In high performance situations (such as games), it would be wise to avoid STL containers. Yes they provide excellent functionality, but they also provide a bit of overhead. And that can be disastrous.

Personally, I only use std's file handling and the occasional vector.

But what do I know? ;)

EDIT: Have your friend take a look at Thrust, which attempts to provide STL like functionality for calculating on the GPU.

knight666
That's wrong, as you can see from my answer the performance loss is 0.1%..
Andreas Bonini
Though your answer to the question seems wrong to me, the Thrust link is certainly worth a look!
xtofl
@Koper: Because of the printf formatting a floating point number, you're not even comparing the performance of the loop. You can't answer this question when you load it down with superfluous unrelated computation.
Mike Dunlavey
+19  A: 

Almost certainly the use of the iostream library versus printf(). If you want to time the algorithm, you should do your output outside the loop.

anon
This. If you have to do output there, make it a fair contest and use printf().
David Seiler
+4  A: 

Using STL, especially when using vectors and other nice utility classes, is probably always going to be slower than hand-rolled C code using malloc and inlined functions. There is no real way around it.

That being said, performance is not everything - not nearly so. Using STL provides many other benefits, including:

  1. Better maintainability: It's more expressive, so you get more done in less code, in a more elegant, clean fashion.
  2. Safety: Using vectors is much, much safer than dealing with pointers and malloc directly.
  3. Flexibility: By using vectors with functors, you'd have a much easier time if you needed to grow this collection on the fly, for example.
  4. Productivity: By making the code cleaner, STL promotes reuse much more effectively than lots of individual C routines to do similar functions.

You're really trying to argue about working at a higher level of abstraction - there are tradeoffs here, typically in terms of performance, but there is a reason nearly all development has gone to higher abstraction levels; the gains are far more valuable than the sacrifices in most cases.

Reed Copsey
Actually the STL `vector` class was designed so that it would cause (almost) zero overhead when compared to pointers. For the iostreams, you may be right, though.
xtofl
Yes - but almost 0 != 0. There are some differences in there. For iostreams, it's a much bigger difference.
Reed Copsey
As I see it, there are three ways to use the vector. 1) Let it grow naturally: overhead of reallocation. 2) Resize to needed size: overhead of zero-initializing all items. 3) Reserve needed size: overhead of still having to check during `push_back` if there's enough room. That said, it is so much more convenient than keeping track of things yourself, and the performance is mostly good enough.
UncleBens
Yeah - I would always vote to use it, but it does carry a (albeit minor) performance difference from doing things yourself.
Reed Copsey
I would like to know where people disagree strongly enough for the downvotes, though...
Reed Copsey
For a good compiler (and this includes recent versions of icc and gcc, not sure about msvc), `vector` produces identical code to C arrays.
Tom
+17  A: 

Using printf:

  for (std::vector<double>::iterator i = vd.begin(); i != vd.end(); ++i)
     printf("%lf\n", *i);

results are:

koper@elisha ~/b $ time ./cpp > /dev/null
real    0m4.985s
user    0m4.930s
sys     0m0.050s
koper@elisha ~/b $ time ./c > /dev/null
real    0m4.973s
user    0m4.920s
sys     0m0.050s

Flags used: -O2 -funroll-loops -finline

Andreas Bonini
+1 for doing an actual test!
Soo Wei Tan
same result as mine :)
xtofl
Won't this code be spending nearly all its time formatting the floating point number? Even if the ++i gets poorly optimized, it's going to use a small percentage of the cycles.
Mike Dunlavey
... If you're trying to compare speed of iteration, get rid of the print. It's the whale in the bathtub.
Mike Dunlavey
A: 

There might be occasions were STL is slower, but handrolling maps/sets for multiple insert/remove/lookup would be hard to accomplish.
As Neil pointed out, in speed, printf wins from iostream (also a point in Scott Meyers "More Effective C++, point 23"). However, in more complicated systems. it pays off to be able to write out complete classes inside loggings. The printf way would be to sprintf class information in a function and pass that to the logger as a string parameter. That way, the gain would be smaller.

stefaanv
+1  A: 

I would argue that you are not even running the same code.
The C code has no error checking and leaks memory on an exception.
To be a fare comparison you need to make the C program do what the C++ program is doing.

bool errorNumber = 0;   // Need a way to pass error information back from the funtion
int main(int argc, char **argv)
{
    ......
    {
        vd[i]=random_double();
        //
        // In C++ this logic is implicit with the use of excptions.
        // Any example where you don't do error checking is not valid.
        // In real life any code has to have this logic built in by the developer
        //
        if (errorNumber != 0)
        {    break;
        }
    }
    ........
    free(vd);  The cost of freeing the memory is not zero that needs to be factored in.
    return 0;
}
Martin York
I disagree. The only functions that can fail in the C example are malloc and printf. malloc may file if he is out of memory, but in that very rare case there is nothing you can do anyway.. printf can in theory file but in practice never does. Anyway, adding that error check is irrelevant to the performance (it will execute so fast that it won't make a difference). Least but not last the memory will be reclaimed by the OS on exit so free() is not necessary (and it's C++'s "fault" that it frees anyway!). But, again, it's only one single free() in a 5 seconds execution time: it won't matter.
Andreas Bonini
I tried to add the error checking and run `time ./c > /dev/null` again, it actually ran faster (obviously not because of the extra code but because of the non deterministic nature of interrupts).
Andreas Bonini
@Koper: Are you really arguing there's no need to call free()? Even if it is freed by the OS at process exit, it is extraordinarily bad coding practice to rely on that. A long running programming could conceivably use more and more memory because of the leak resulting from not calling free. Furthermore, not all OSes will reclaim the memory at process exit, particularly those without a virtual memory manager. Real-time embedded OSes often do not have a VM, so you'd be hosed on such platforms if you didn't properly manage your memory.
Void
@Koper: If you look specifically at this function 'random_double'. Then fine it can never fail. But if you look at this as a general function that has some arbitrory code then you need some way of checking error conditions and forcing the loop to exit and doing the general memory management.
Martin York
@Kopir: If you look at this code as the whole program fine (Ten line program in C/C++ sure. maintance is zero, but in that case who cares about 6 seconds?) I was looking at this as a slice of a real application that was extracted just to do timeing; In this context then yes you do need to do the free, all the other checking, loop termination, error propogation, memory release etc. All the stuff needed by a real application that needs to be explicitly added into a real C application otherwise you are not doing an accurate comparison of the two pieces of code.
Martin York
@Kopir: cont...: In the comparison above what you are saying is: I have this hand optimized super fast loop in C where I do not require any checking, can I campare this against the compiler optimized version of C++ that is not allowed to make any of those assumptions. This is comparing apples to oranges.
Martin York
Well I see your point, I considered it not as a part of a real application but for what it really is: a stupid program that exits right after. I guess your approach makes more sense though.
Andreas Bonini
+2  A: 

Believing in the bad performance of the insertion iterator of std::cout, I tried to insert the following functor:

struct Print {
  void operator()( double d ) { printf("lf\n", d); }
};

And use for_each on the stl container.

 generate(vd.begin(), vd.end(), random_double());
  //copy(vd.begin(), vd.end(), std::ostream_iterator<double>(std::cout, "\n"));
  std::for_each(vd.begin(), vd.end(), Print() );

As a matter of fact, I now got

time.exe raw_vs_stl.exe stl > t.txt
real    0m 2.48s
user    0m 1.68s
sys     0m 0.28s

for the STL version... while the 'raw' version results in more or less the same.

time.exe raw_vs_stl.exe raw > t.txt
real    0m 9.22s
user    0m 7.89s
sys     0m

0.67s Conclusion: vector performance is as good as a raw array's. It's safer and easier to use.

(disclaimer: used VC2005)

xtofl
+1  A: 

One trick for getting an idea of both the difference in speed between two implementations - and the reasons for it - is to delve into the assembly. Assembly really isn't that scary, and shows you exactly what's going on. It's also really handy for seeing what the compiler optimizes out. Generally speaking, more assembly instructions = longer, but keep in mind that some instructions take much longer than others.

In Visual Studio (and, I suspect, many other IDs) there's an option to look over assembly interleaved with the corresponding C++ lines. (In VC, this is Debug->Windows->Dissassembly).

Narfanator