views:

343

answers:

10

I have a vector containing large number of elements. Now I want to write a small function which counts the number of even or odd elements in the vector. Since performance is a major concern I don't want to put an if statement inside the loop. So I wrote two small functions like:

long long countOdd(const std::vector<int>& v)
{
    long long count = 0;
    const int size = v.size();
    for(int i = 0; i < size; ++i)
    {
     if(v[i] & 1)
     {
      ++count;
     }
    }
    return count;
}

long long countEven(const std::vector<int>& v)
{
    long long count = 0;
    const int size = v.size();
    for(int i = 0; i < size; ++i)
    {
      if(0 == (v[i] & 1))
     {
      ++count;
     }
    }
    return count;
}

My question is can I get the same result by writing a single template function like this:

template <bool countEven>
long long countTemplate(const std::vector<int>& v1)
{
    long long count = 0;
    const int size = v1.size();
    for(int i = 0; i < size; ++i)
    {
     if(countEven)
     {
      if(v1[i] & 1)
      {
       ++count;
      }
     }
     else if(0 == (v1[i] & 1))
     {
      ++count;
     }
    }
    return count;
}

And using it like this:

int main()
{
  if(somecondition)
  {
     countTemplate<true>(vec); //Count even
  }      
  else
  {
     countTemplate<false>(vec); //Count odd
  } 
}

Will the code generated for the template and non-template version be the same ? or will there be some additional instructions emitted?

Note that the counting of numbers is just for illustration hence please don't suggest other methods for counting.

EDIT: Ok. I agree that it may not make much sense from performance point of view. But atleast from maintainability point of view I would like to have only one function to maintain instead of two.

+1  A: 

This will depend on how smart the compiler optimizer is. The compiler might be able to see that really the if-statement is redundant and only one branch of it is executed and optimize the whole thing.

The best way to check is to try and look at the assembly - this code will not produce too much of machine code.

sharptooth
Are you suggesting that there exist compilers that won't optimize out an if(true) ?That's a scary thought...
jalf
Yes, I'm suggesting exactly that. Compilers can be very dumb and if you really care about such optimizations you need to look at the assembly code emitted.
sharptooth
Actually, the nice bit about C++ templates is that they put dumb compiler writers out of business. Any organization capable enough to produce a C++ frontend can also produce a backend capable of doing dead code optimization. The latter isn't exactly rocket science.
MSalters
+9  A: 

The templated version may and, very probably, will be optimized by the compiler when it sees a certain branch in the code is never reached. The countTemplate code for instance, will have the countEven template argument set to true, so the odd branch will be cut away.

(sorry, I can't help suggesting another counting method)

In this particular case, you could use count_if on your vector:

struct odd { bool operator()( int i )const { return i&1; } };
size_t nbOdd = std::count_if( vec.begin(), vec.end(), odd() );

This can also be optimized, and writes way shorter :) The standard library developers have given possible optimization much thought, so better use it when you can, instead of writing your own counting for-loop.

xtofl
As I said, counting even or odd is just an example to clarify my question. In actual code I will do something different.
Naveen
And countEven is of course implemented as `v.size() - countOdd(v);`
Steve Jessop
A: 

In general, the outcome will be much the same. You are describing an O(n) iteration over the linear memory of the vector.

If you had a vector of pointers, suddenly the performance would be way worse because the memory locality of reference would be lost.

However, the more general thing is that even netbook CPUs can do gazallions of operations per second. Looping over your array is most unlikely to be performance-critical code.

You should write for readability, then profile your code, and consider doing more involved hand-tweaked things when the profiling highlights the root cause of any performance issue you have.

And performance gains typically come from algorithmic changes; if you kept count of the number of odds as you added and removed elements from the vector, for example, it would be O(1) to retrieve...

Will
+2  A: 

I guess that good compiler will cut redundant code in your template as countEven is compile time constant and it is very simple to implement such optimization during template instantiation.

Anyway it seems pretty strange. You wrote a template but do "dynamic switching" inside. May be try something like that:

struct CountEven {}
struct CountOdd {}

inline void CountNum(int & num, long long & count, const CountEven &)
{
   if(num & 1)
   {
      ++count;
   }
}

inline void CountNum(int & num, long long & count, const CountOdd &)
{
   if(0 == (num & 1))
   {
      ++count;
   }
}


template <class T>
long long countTemplate(const std::vector<int>& v1)
{
    long long count = 0;
    const int size = v1.size();
    for(int i = 0; i < size; ++i)
    {
        CountNum(v1[i], count, T());
    }
    return count;
}

It will select necessary CountNum() function version on compilation stage:

int main()
{
  if(somecondition)
  {
     countTemplate<CountEven>(vec); //Count even
  }      
  else
  {
     countTemplate<CountOdd>(vec); //Count odd
  } 
}

Code is messy, but I think you got the idea.

bocco
A: 

I see that you're using long long for counter, and that probably means that you expect huge number of elements in vector. In that case, I would definitely go for template implementation (because of code readability) and just move that if condition outside for loop.

If we assume that compiler makes no optimization whatsoever, you would have 1 condition and possibly more than 2 billion iterations through vector. Also, since the condition would be if (true) or if (false) the branch prediction would work perfectly and execution would be less than 1 CPU instruction.

I'm pretty sure that all compilers on the market have this optimization, but I would quote my favorite when it comes to performance: "Premature optimization is the root of all evil" and "There're only 3 rules of optimization: Measure, measure and measure".

Aleksandar
The counter should be `size_t` anyway, not `long long` (which firstly isn't the type of `v.size()`, secondly isn't in the C+ standard, and finally may be wasted effort on some 16 and 32bit architectures if it has to use a 64bit arithmetic library instead of simple opcodes).
Steve Jessop
+1  A: 

The first thing that comes to my mind are the two optimization "rules":

  • Don't optmized prematurely.
  • Don't do it yet.

The point is that sometimes we bother about a performance bottleneck which will never happen in practice. There are studies that say that 20 percent of the code is responsible for 80 percent of the software execution time. Of course this doesn't mean you pessimize prematurely, but I don't think that's your case.

In general, you should do this kind of optmization only after you have actually run a profiler on your program and identified the real bottlenecks.

Regarding your function versions, as other have said this depends on your compiler. Just remember that with the template approach you won't be able to switch calls at runtime (template is a compile-time tool).

A final note: long long is not standard C++ (yet).

ltcmelo
What do you mean, "you wont be able to switch calls at runtime"? You can call counteTemplate<true> or countTemplate<false>, just like you could call countEven or countOdd. And I find it hard to imagine a compiler that wouldn't optimize the template version as well as the nontemplate one.
jalf
I'm saying that it's not possible to depend on runtime information for the template instantiation. For example: bool value = function_returning_bool();countTemplate<value>(...); //Error.
ltcmelo
@ltcmeloo: I admit `ìf(function_returning_bool()) countTemplate<true>(); else countTemplate<false>();` isn't exactly beautiful, but it does keep the `if` statement out of the loop.
sbi
+5  A: 

Your template version will generate code like this:

template <>
long long countTemplate<true>(const std::vector<int>& v1)
{
    long long count = 0;
    const int size = v1.size();
    for(int i = 0; i < size; ++i)
    {
        if(true)
        {
                if(v1[i] & 1)
                {
                        ++count;
                }
        }
        else if(0 == (v1[i] & 1))
        {
                ++count;
        }
    }
    return count;
}


template <>
long long countTemplate<false>(const std::vector<int>& v1)
{
    long long count = 0;
    const int size = v1.size();
    for(int i = 0; i < size; ++i)
    {
        if(false)
        {
                if(v1[i] & 1)
                {
                        ++count;
                }
        }
        else if(0 == (v1[i] & 1))
        {
                ++count;
        }
    }
    return count;
}

So if all optimizations are disabled, the if will in theory still be there. But even a very naive compiler will determine that you're testing a constant, and simply remove the if.

So in practice, no, there should be no difference in the generated code. So you can use the template version and don't worry about this.

jalf
If you crank the warning level up, you'll get a "conditional expression is constant" warning on those ifs. Use one of the STL <algorithm>s. I know you said counting was only an example, but there are lots of other algorithms (like for_each) that can be used to get highly optimized processing of the elements in a container.
Adrian McCarthy
+1  A: 

If you care about optimization issues try to make it like the following:

template <bool countEven>
long long countTemplate(const std::vector<int>& v1)
{
    long long count = 0;
    const int size = v1.size();
    for ( int i = 0; i < size; ++i ) {
      // According to C++ Standard 4.5/4: 
      // An rvalue of type bool can be converted to an rvalue of type int, 
      // with false becoming zero and true becoming one.
      if ( v1[i] & 1 == countEven ) ++count;
    }
    return count;
}

I believe that the code above will be compiled in the same code as without templates.

Kirill V. Lyadvinsky
+1  A: 

Use STL, Luke :-) It's even as example in reference

bool isOdd(int i)
{
    return i%2==1;
}

bool isEven(int i)
{
    return i%2==0;
}

std::vector<int>::size_type count = 0;
if(somecondition)
{
    count = std::count_if(vec.begin(), vec.end(), isEven);
}
else 
{
    count = std::count_if(vec.begin(), vec.end(), isOdd);
}
Tadeusz Kopec
A: 

If you absolutely absurdly care about fast looking code:

(a clever compiler, or one otherwise hinted at using directives or intrinsics, could do this in parallel using SIMD; CUDA and OpenCL would of course eat this for breakfast!)

int count_odd(const int* array,size_t len) {
   int count = 0;
   const int* const sentinal = array+len;
   while(array<sentinal)
      count += (*array++ & 1);
   return count;
}

int count_even(const int* array,size_t len) {
   return len-count_odd(array,len);
}
Will
What's the advantage of C arrays and pointers over a `std::vector` and its iterators?
sbi
Disadvantage actually. std::vector<int> might even be optimized to be 16-byte aligned so the equivalent code can use SIMD aligned operations, whereas this code cannot make the alignment assumption about `const int* array`.
MSalters