views:

304

answers:

7

I have the following code:

template <int size>
inline uint hashfn( const char* pStr )
{
   uint result = *pStr;
   switch ( size )
   {
      case 10:
         result *= 4;
         result += *pStr;
      case 9:
         result *= 4;
         result += *pStr;
      ...
      ...
      case 2:
         result *= 4;
         result += *pStr;
   }
   return result;
}

This code is a hash function for DNA sequences of certain lenghts where the length is a template parameter. It is an unrolled loop with a switch statement to jump in at the right spot. The size is however a constant since it is a template parameter. Could I specialize this for certain size values? Maybe with something like:

template <int 2>
inline uint hashfn( const char* pStr )
{
  uint result = *pStr;
  result *= 4;
  ++pStr;
  result += *pStr;
  return result;
}
A: 

There's no loop involved in your original code, so it cannot be unrolled.

McWafflestix
The loop is already unrolled. It the elimination of the compare in the switch statement that I am thinking about.
James Dean
A: 

You are allowed to specialize like that. However, I think you are focusing on the optimization of unrolling the loop. Until you profile your code and show that it is a bottleneck, you are better off not prematurely optimizing.

rlbond
+1  A: 

Please read up about loop unrolling on wikipedia. The whole point is to save comparisons on the loop variable. Did you profile the code? I do not see how this can save you cycles compared to a loop.

Any modern compiler should completely unroll a loop with a small static loop count for you.

I also hope you don't use the hashes for modulo-based hash tables, as you will lose the upper bits of your hash.

ebo
Actually, I've improved performance by rolling loops up tight. I think it had something to do with cache locality.
David Thornley
Improbable, as long as you are loading at least ints and do not change the order of the memory accesses. I would guess it had to do with branch prediction. A correct guess can make a great difference with low round trip times. Also: Are you sure your compiler did not unroll the loop again?
ebo
A: 

I have the following code: ... (code that uses Duff's device) ...

It is an unrolled loop with a switch statement to jump in at the right spot. The size is however a constant since it is a template parameter. Could I specialize this for certain size values?

You certainly can, but there will be a lot of boilerplate involved, for each possible size. With appropriate levels of abstraction, and use of the Boost metaprogramming and preprocessor libraries you might even get this down to something reasonable.

Your maintenance costs will go up, so I'll echo others and recommend you make sure you really should do this now that you know you can.

Max Lybbert
+3  A: 

I would tend to do it recursively with templates.

E.g. :

template<class TOp,int factor>
struct recursive_unroll {
    __forceinline static void result( TOp& k ) {
        k();
        recursive_unroll<TOp,factor-1>::result( k );
    }
};

template<class TOp>
struct recursive_unroll<TOp,0> {
    __forceinline static void result( TOp& k ) {}
};


struct  op    {
    op( const char* s ) : res( 0 ), pStr( s )   {}

    unsigned int res;
    const char* pStr;        
    __forceinline void  operator()()  {
        res *= 4;
        res += *pStr;
        ++pStr;
        //std::cout << res << std::endl;
    }
};

char str[] = "dasjlfkhaskjfdhkljhsdaru899weiu";

int _tmain(int argc, _TCHAR* argv[])
{
    op tmp( str );
    recursive_unroll<op,sizeof( str ) >::result( tmp );
    std::cout << tmp.res << std::endl;
    return 0;
}

This produces optimal code for me. Without __forceinline the code is not properly inlined.

You should always bench your code before using such optimizations. And then you should look at the assembly and decipher what your compiler already does for you. But in this case, it seems to be an boost (for me).


__forceinline is a Microsoft Visual Studio specific extension. The compiler should generate optimal code, but for this it doesn't. So here I used this extension.

Christopher
I like this one. Too bad my compiler does not compile it :-(.
James Dean
Compiles for me under VS 2008. What is the error your compiler gives?
Christopher
__forceinline is a MS extension.
jalf
For most compilers, there is an equivalent attribute. In a world where compilers were perfect, I would prefer to drop it, but in this sample it seems to be needed to get optimal code. (As careful studies of the generated assembler indicate)
Christopher
Updated to point out, that __forceinline is compiler dependent and may not be needed.
Christopher
A: 

I dont know if the semantics is right, but in this case the function template itself could be recursive:

template <int size>
inline uint hashfn( const char* pStr ) {
    uint result = *pStr;
    result *= 4;
    return result + hashfn<size-1>(++pStr);
}

//case 0 stops
template <>
inline uint hashfn<0>( const char* pStr ) {
    return 0;
}

As Christopher said, just be sure that it gets inlined...

ogoid
as the saying goes... for every problem there is a solution that is both simply and wrong... the code I posted won't work as is because you would need to pass the previous calculation to the next iteration, not just add them as I've done. This could be made by an functor as sugested by Christopher or by overloading hashfn to receive the previous step result.
ogoid
A: 

For code

template <size_t N>
void
function (float *a, float *b, float *c)
{
  for (size_t i = 0; i < N; ++i)
    {
      c[i] = a[i] * b[i];
    }
}

my compilers (msvc 2005, gcc 4.3) unroll loops automatically.

W55tKQbuRu28Q4xv