ansaurus

Question

Can one unroll a loop when working with an integer template parameter?

Answer 1

A:

There's no loop involved in your original code, so it cannot be unrolled.

McWafflestix 2009-07-08 17:25:23

The loop is already unrolled. It the elimination of the compare in the switch statement that I am thinking about.

James Dean 2009-07-08 17:28:43

Answer 2

A:

You are allowed to specialize like that. However, I think you are focusing on the optimization of unrolling the loop. Until you profile your code and show that it is a bottleneck, you are better off not prematurely optimizing.

rlbond 2009-07-08 17:25:42

Answer 3

+1 A:

Please read up about loop unrolling on wikipedia. The whole point is to save comparisons on the loop variable. Did you profile the code? I do not see how this can save you cycles compared to a loop.

Any modern compiler should completely unroll a loop with a small static loop count for you.

I also hope you don't use the hashes for modulo-based hash tables, as you will lose the upper bits of your hash.

ebo 2009-07-08 17:40:01

Actually, I've improved performance by rolling loops up tight. I think it had something to do with cache locality.

David Thornley 2009-07-08 20:32:45

Improbable, as long as you are loading at least ints and do not change the order of the memory accesses. I would guess it had to do with branch prediction. A correct guess can make a great difference with low round trip times. Also: Are you sure your compiler did not unroll the loop again?

ebo 2009-07-08 20:37:58

Answer 4

A:

I have the following code: ... (code that uses Duff's device) ...

It is an unrolled loop with a switch statement to jump in at the right spot. The size is however a constant since it is a template parameter. Could I specialize this for certain size values?

You certainly can, but there will be a lot of boilerplate involved, for each possible size. With appropriate levels of abstraction, and use of the Boost metaprogramming and preprocessor libraries you might even get this down to something reasonable.

Your maintenance costs will go up, so I'll echo others and recommend you make sure you really should do this now that you know you can.

Max Lybbert 2009-07-08 20:25:40

Answer 5

+3 A:

I would tend to do it recursively with templates.

E.g. :

template<class TOp,int factor>
struct recursive_unroll {
    __forceinline static void result( TOp& k ) {
        k();
        recursive_unroll<TOp,factor-1>::result( k );
    }
};

template<class TOp>
struct recursive_unroll<TOp,0> {
    __forceinline static void result( TOp& k ) {}
};


struct  op    {
    op( const char* s ) : res( 0 ), pStr( s )   {}

    unsigned int res;
    const char* pStr;        
    __forceinline void  operator()()  {
        res *= 4;
        res += *pStr;
        ++pStr;
        //std::cout << res << std::endl;
    }
};

char str[] = "dasjlfkhaskjfdhkljhsdaru899weiu";

int _tmain(int argc, _TCHAR* argv[])
{
    op tmp( str );
    recursive_unroll<op,sizeof( str ) >::result( tmp );
    std::cout << tmp.res << std::endl;
    return 0;
}

This produces optimal code for me. Without __forceinline the code is not properly inlined.

You should always bench your code before using such optimizations. And then you should look at the assembly and decipher what your compiler already does for you. But in this case, it seems to be an boost (for me).

__forceinline is a Microsoft Visual Studio specific extension. The compiler should generate optimal code, but for this it doesn't. So here I used this extension.

Christopher 2009-07-27 22:08:56

I like this one. Too bad my compiler does not compile it :-(.

James Dean 2009-07-28 12:34:39

Compiles for me under VS 2008. What is the error your compiler gives?

Christopher 2009-07-28 12:41:52

__forceinline is a MS extension.

jalf 2009-10-12 13:43:20

For most compilers, there is an equivalent attribute. In a world where compilers were perfect, I would prefer to drop it, but in this sample it seems to be needed to get optimal code. (As careful studies of the generated assembler indicate)

Christopher 2009-10-12 16:27:18

Updated to point out, that __forceinline is compiler dependent and may not be needed.

Christopher 2009-10-12 16:30:37

Answer 6

A:

I dont know if the semantics is right, but in this case the function template itself could be recursive:

template <int size>
inline uint hashfn( const char* pStr ) {
    uint result = *pStr;
    result *= 4;
    return result + hashfn<size-1>(++pStr);
}

//case 0 stops
template <>
inline uint hashfn<0>( const char* pStr ) {
    return 0;
}

As Christopher said, just be sure that it gets inlined...

ogoid 2009-07-28 02:42:57

as the saying goes... for every problem there is a solution that is both simply and wrong... the code I posted won't work as is because you would need to pass the previous calculation to the next iteration, not just add them as I've done. This could be made by an functor as sugested by Christopher or by overloading hashfn to receive the previous step result.

ogoid 2009-07-28 04:10:13

Answer 7

A:

For code

template <size_t N>
void
function (float *a, float *b, float *c)
{
  for (size_t i = 0; i < N; ++i)
    {
      c[i] = a[i] * b[i];
    }
}

my compilers (msvc 2005, gcc 4.3) unroll loops automatically.

W55tKQbuRu28Q4xv 2009-10-12 12:13:17

ansaurus

tags:

views:

answers:

Can one unroll a loop when working with an integer template parameter?

related questions