views:

65

answers:

2

Hello

Can I use template to create several instantiations of some function, different only in some constant parameter? The number of alternatives for this parameter is fixed. E.g.

I want not to rewrite (where upper is in 1..32 in powers of two)

funct(param, int upper)
 {
  some_loops(..)
     some_heavy_code_fast_for_const_and_slow_for_variable(upper)
 }

into a set of

funct_with_upper_is_1(param) // upper =1
 { manually_copied_code...heavy(1) }
funct_with_upper_is_2(param) // upper =2
 { manually_copied_code...heavy(2) }
funct_with_upper_is_4(param) // upper =4
 { manually_copied_code...heavy(4) }
funct_with_upper_is_8(param) // upper =8
 { manually_copied_code...heavy(8) }

but into templated

template<int upper>
 funct_with_fixed_upper(param)
 { the_original_code....heavy(upper) }

and then

template<upper=1> funct_with_fixed_upper(param);
template<upper=2> funct_with_fixed_upper(param);
template<upper=4> funct_with_fixed_upper(param);
template<upper=8> funct_with_fixed_upper(param);

Is this possible with C++ tempaltes?

== Verbose mode on ==

I have a lot C++ files with code like that

function_typical(long long double ***A, long long double ***B, int const_1, int const_2)
// the type "long long double" here is very correct, this is extension of compiler
{

   for(int i=1;i<100000-1;i++)
      for(int j=1;j<100000-1;j++)
         for(int k=0;k<const_1;k++)
            for(int l=k;l<const_2;l++) {
                // some cray work with array like
                A[i][j][l-k]+=(B[i][j][l-k]+A[i+1][j][l-k]+A[i][j+1][l-k]-A[i-1][j][k]-A[i][j-1][l-k]/2.2)/88.3;
                if(A[i][j][l-k]>sin_lld(A[i][j-1][l-k])){B[i][j][l-k]=A[i][j][k]*4;}
            }
 }

This is just an example, but:

  • I can't interchange the loops;
  • the 2 outer loops, i & j have a lot of iterations
  • the 2 inner (nested), k& l have a bit of iterations, number of which is passed into the function_typical and the set of them are fixed, e.g. const_1 and const_2 is one of pairs: (2,3), (4,5), (3,5). Total number of allowed pairs is smaller then 10.

The problem with this code is its speed is very low. If I will fix const_1 and const_2 in this code to numberic constants, compiler will do a great job in optimizing (e.g unrolling all k and all l iterations, doing a some smart job).

But I physically can't change every typical-like function of every set of (const_1 and const_2) pair. Also, constant propagator of compiler can't propagate constants of the set info the function (this is a networking server and the client does a selection of some const_1 and const_2 pair form fixed set).

So I know in compile-time all the alternatives of pairs. But I have no chance of rewrite every function by hand.

== verbose mode off ==

Thanks in advance

+5  A: 

Absolutely, this is entirely possible. If you take an int by template, it is valid wherever a constant expression is valid.

template<int const_1, int const_2> function_typical(long long double ***A, long long double ***B)
// the type "long long double" here is very correct, this is extension of compiler
{

   for(int i=1;i<100000-1;i++)
      for(int j=1;j<100000-1;j++)
         for(int k=0;k<const_1;k++)
            for(int l=k;l<const_2;l++) {
                // some cray work with array like
                A[i][j][l-k]+=(B[i][j][l-k]+A[i+1][j][l-k]+A[i][j+1][l-k]-A[i-1][j][k]-A[i][j-1][l-k]/2.2)/88.3;
                if(A[i][j][l-k]>sin_lld(A[i][j][l-k])){B[i][j][l-k]=A[i][j][k]*4;}
            }
 }

This should re-compile straight, but you'll have to alter the call sites. Also, don't forget that templated code has to have the full source in all translation units and can't be defined in just one.

DeadMG
" alter the call site" is not a problem, but
osgx
'full source in all translation units' is a problem. There are almost a megabyte of this typical functions.
osgx
The good news is that all calls a going from a very small set of network-interfacing functions, each of them uses only part of typical functions.
osgx
If you've got a megabyte of this 'typical' function because of duplication, the template method will simplify maintenance, and code base size, but not compiled code size - the template instantiations will expand to the same size. If "function_typical" always has the outer loop numbers the same, and the inner based on templates, and the work in the innermost defined by a function, you could template the function call as well, and reduce your maintenance more. It'll still have the same overall compiled size, however.
Harper Shelby
Well, either you have one function with "variable constants" or you have to generate all the possible functions. Whether that's done manually by you or automatically with templates by the compiler, they have to exist. You can have both.
miked
The compiler will see a code of `function_typical<2,3>(A1, B2)` as the crazy code with constants in the nested loops? Is the syntax I use here right for such templated function call?
osgx
Harper Shelby, you are welcome to copy your comment in the separate answer to get your +1 and some comments from me.
osgx
Yeah, this kind of thing exactly what "non-type template parameters" are for. Template instantiation is a Turning-complete compile-time computation facility. Some may feel this to be a bit frightening, but my favorite is the C++ source that prints the prime numbers in a sequence of compiler warning messages.
Marsh Ray
+1  A: 

Provided your original template:

template<int upper> 
  funct_with_fixed_upper(param)
  { the_original_code....heavy(upper) }

then when you call it, you would do so like this:

 funct_with_fixed_upper<1>(param);
 funct_with_fixed_upper<2>(param);`

if you need to specialize any of them based on the constant you would do it like this:

 template<> funct_with_fixed_upper<1>(param) { // code here };

and as others have already said, this would simplifiy your code maintence but would not really reduce the size of code compiled as the compiler would still expand this out...

diverscuba23
Yes, it is correct, I want to expand the code by specialized the constant pairs. Also I want compiler to full unrolling of two nested loops (this will expand code more). But! Here I need the highest possible performance for every function, in trade of code size. Also I need to do a smallest change to the code.
osgx
"specialize any of them based on the constant" no, the code is the same, only the constant is different for every variant. I hope, my compiler can optimize the constant case very good.
osgx