views:

209

answers:

1

I'm trying to reduce the number of instructions and constant memory reads for a CUDA kernel.

As a result, I have realised that I can pull out the tile sizes from constant memory and turn them into macros. How do I define macros that evaluate to constants during preprocessing so that I can simply adjust three values and reduce the number of instructions performed in each kernel?

Here's an example:

#define TX 8
#define TY 6
#define TZ 4

#define TX2 (TX * 2)
#define TY2 (TY * 2)

#define OVER_TX (1.0f / float(TX))

Maybe this is already the case (or possibly handled by the nvcc compiler), but clearly I want the second block of macros to be evaluated by the preprocessor rather than replaced in the code so that it is not performed in every kernel. Any suggestions?

+2  A: 

Modern compilers will typically evaluate constants such as this at compile-time wherever possible, so you should be OK. This is also true for properly defined constants (i.e. using const rather than the "old skool" #define method).

Paul R
Ok great, that's logical. What about more complex constructions such as #define (1.0f / float(TX)) This is even more critical not to be performed at runtime?
Dan
If TX is know at compile-time then yes, that should also get simplified. For performance-critical stuff like this though you ought to get into the habit of looking at the compiler output to see what is actually being generated - this can give useful insights into how to write your code to get the best out of the compiler and also can give further ideas for micro-optimizations.
Paul R