tags:

views:

989

answers:

14

Hi, Which of the following is better and why? (Particular to c++)

a.

int i(0), iMax(vec.length());//vec is a container, say std::vector
for(;i < iMax; ++i)
{
  //loop body
}

b.

for( int i(0);i < vec.length(); ++i)
{
  //loop body
}

I have seen advice for (a) because of the call to length function. This is bothering me. Doesn't any modern compiler do the optimization of (b) to be similar to (a)?

A: 

Simple question: are you modifying vec in the loop?

answer to this question will lead to your answer too.

jrh

Here Be Wolves
+7  A: 

I like:

for (int i = 0, e = vec.length(); i != e; ++i)

Of course, this would also work for iterators:

for (vector<int>::const_iterator i = v.begin(), e = v.end(); i != e; ++i)

I like this because it's both efficient (calling end() just once), and also relatively succinct (only having to type vector<int>::const_iterator once).

Assaf Lavie
This approach seems to be more readable to me. Is it documented in any book? Which? I am reading this for the first time on stackoverflow.
Amol Gawai
No, `end` is called more than once but since the call is inlined that won't matter.
Konrad Rudolph
@Konrad, why do you say end is called more than once in the 2nd example? How many times do you think it's called (and why)?
Assaf Lavie
what's wrong with calling i != v.end() in loop? it's probably inlined and end() returns pointer one past end of array ...
stefanB
@stefan, true, but they also might not be. And in any case, why not write it in the way that you know for certain is most efficient, regardless of compiler optimizations?
Assaf Lavie
@stefan, I suggest you do some measurements with your compiler and measure the actual difference between these styles. I did this once for VC 2003 and there was a real difference.
Assaf Lavie
A: 

It's very hard for a compiler to hoist the vec.length() call in the safe knowledge that it's constant, unless it gets inlined (which hopefully it often will!). But at least i should definitely be declared in the second style "b", even if the length call needs to be "manually" hoisted out of the loop!

Alex Martelli
It's very hard for compilers, but branch predictor will take care of it.
Artem Barger
+3  A: 

Why is it bodering you? Those two alternatives dont see to be doing the same. One is doing a fixed number of iterations, while the other is dependant on the loops body.

Another alternative colud be

for (vector<T>::iterator it=vec.begin();it!=vec.end();it++){
 //loop body
}
Tom
And similar to other answers, you can also use: for (vector<T>::iterator it = vec.begin(), end = vec.end(); it != end; ++it) { process(*it); }
Roger Pate
Furthermore, once it's down to a single function: std::for_each(vec.begin(), vec.end(), process);
Roger Pate
+2  A: 

Unless you need the loop variable outside the loop, the second approach is preferable.

Iterators will actually give you as good or better performance. (There was a big comparison thread on comp.lang.c++.moderated a few years back).

Also, I would use

int i = 0;

Rather than the constructor like syntax you're using. While valid, it's not idiomatic.

nsanders
I have habit of initializing almost anything this way since i use c++ extensively? Is it a bad habit?
Amol Gawai
@Amol: that's a wholly different discussion. It's not bad in any way.
xtofl
@nsanders: Great that you mention the variable scope!
xtofl
+2  A: 

Somewhat unrelated:

Warning: Comparison between signed and unsigned integer.

The correct type for array and vector indices is *size_t*.

Strictly speaking, in C++ it is even std::vector<>::size_type.

Amazing how many C/C++ developers still get this one wrong.

DevSolar
Yup - especially since the correct answer is vector<>::size_type not size_t.
MSalters
Strictly speaking, yes, you are correct. But I'd be happy enough to see size_t already, and the chances of educating people to use `vector<>::size_type` are... negligible.
DevSolar
The value of using vector<>::size_type over size_t is negligible as well. It's only useful in cases where you are already working with a template container (in which case container_type::size_type is more explicit than size_t).
Tom
+11  A: 

Example (b) has a different meaning to example (a), and the compiler must interpret it as you write it.

If, (for some made-up reason that I can't think of), I wrote code to do this:

for( int i(0);i < vec.length(); ++i)
{
    if(i%4 == 0)
       vec.push_back(Widget());
}

I really would not have wanted the compiler to optimise out each call to vec.length(), because I would get different results.

Andrew Shepherd
Perfect answer. You might want to add that if you don't change the length of the vector inside the loop, construct (a) in the question (or some of the similar suggestions in the other answers) should be preferred because it is hard for the compiler to optimize the call to length() away
stephan
compiler optimization or not if you use iterator with begin() - end() you'll get the end pointer once and you don't need to worry about size and all ...
stefanB
A: 

This one is preferable:

typedef vector<int> container; // not really required,
                               // you could just use vector<int> in for loop

for (container::const_iterator i = v.begin(); i != v.end(); ++i)
{
    // do something with (*i)
}
  • I can tell right away that the vector is not being updated
  • anyone can tell what is happening here
  • I know how many loops
  • v.end() returns pointer one past the last element so there's no overhead of checking size
  • easy to update for different containers or value types
stefanB
+1 for clear explanation. :)
Amol Gawai
and clear code ...
stefanB
greetings downvoters I don't like you go away ...
stefanB
+2  A: 

I'm surprised nobody has said the obvious:

In 99.99% of cases, it doesn't matter.

Unless you are using some container where calculating size() is an expensive operation, it is unfathomable that your program will go even a few nanoseconds slower. I would say stick with the more readable until you profile your code and find that size() is a bottleneck.

rlbond
+1 for readable first, profiling second, and performance tweak third...
RBerteig
It means something different! So it _does_ matter!
xtofl
In 99.99% of cases, it doesn't matter. It only matters (a) if size() changes inside the loop, or (b) if checking size() is prohibitively slow.
rlbond
+1  A: 

Let's see on the generated code (I use MSVS 2008 with full optimization).

a.

int i(0), iMax(vec.size());//vec is a container, say std::vector
for(;i < iMax; ++i)
{
  //loop body
}

The for loop produces 2 assembler instructions.

b.

for( int i(0);i < vec.size(); ++i)
{
  //loop body
}

The for loop produces 8 assembler instructions. vec.size() is successfully inlined.

c.

for (std::vector<int>::const_iterator i = vec.begin(), e = vec.end(); i != e; ++i)
{
  //loop body
}

The for loop produces 15 assembler instructions (everything is inlined, but the code has a lot of jumps)

So, if your application is performance critical use a). Otherwise b) or c).

Sergius
Interesting. How to know assembler instruction details?
Amol Gawai
Except that in c, accessing elements in vec is fewer assembly instructions than in a or b.
rlbond
Got the answer on stackoverflow itself. http://stackoverflow.com/questions/840321/how-can-i-see-the-assembly-code-for-a-c-program
Amol Gawai
Number of assembly instructions usually doesn't matter. Modern CPUs remap instructions. They also have a bandwidth limit on accessing main memory. Since all loops access the same elements, they're all likely to hit this same limit.
MSalters
Modern CPUs can remap instructions but cannot skip them. And CPU cache may reduce influence of the memory bandwidth limit. But I agree with you, in most cases there will be no significant differrence between a), b) and c).
Sergius
A: 

(b) won't calculate/call the function each time.

-- begin excerpt ----

Loop Invariant Code Motion: GCC includes loop invariant code motion as part of its loop optimizer as well as in its partial redundancy elimination pass. This optimization removes instructions from loops, which compute a value which does not change throughout the lifetime of a loop.

--- end excerpt --

More optimizations for gcc:

https://www.in.redhat.com/software/gnupro/technical/gnupro_gcc.php3

+2  A: 

There are two issues to debate here:

  1. The variable scope
  2. The end condition re-evaluation

Variable scope

Normally, you wouldn't need the loop variable to be visible outside of the loop. That's why you can declare it inside the for construct.

End condition re-evaluation

Andrew Shepherd stated it nicely: it means something different to put a function call inside the end condition:

for( vector<...>::size_type i = 0; i < v.size(); ++i ) { // vector size may grow.
   if( ... ) v.push_back( i ); // contrived, but possible
}

// note: this code may be replaced by a std::for_each construct, the previous can't.
for( vector<...>::size_type i = 0, elements = v.size(); i != elements; ++i ) {
}
xtofl
+1  A: 

It should be noted that the iterator examples:

for (vector<T>::iterator it=vec.begin();it!=vec.end();it++){
 //loop body
}

could invalidate the loop iterator 'it' should the loop body cause the vector to reallocate. Thus it is not equivalent to

for (int i=0;i<vec.size();++i){
 //loop body
}

where loop body adds elements to vec.

equackenbush
good remark about the iterator invalidation!
xtofl
A: 

Why not sidestep the issue entirely with BOOST_FOREACH

#include <boost/foreach.hpp>

std::vector<double> vec;

//...

BOOST_FOREACH( double &d, vec)
{
    std::cout << d;
}
Roddy