This is part of my header file ("aes_locl.h"):
.
.
# define SWAP(x) (_lrotl(x, 8) & 0x00ff00ff | _lrotr(x, 8) & 0xff00ff00)
# define GETU32(p) SWAP(*((u32 *)(p)))
# define PUTU32(ct, st) { *((u32 *)(ct)) = SWAP((st)); }
.
.
Now from .cu file I have declared a __ global__ function and included the header file like this :
#include "aes_locl.h"
.....
__global__ void cudaEncryptKern(u32* _Te0, u32* _Te1, u32* _Te2, u32* _Te3, unsigned char* in, u32* rdk, unsigned long* length)
{
u32 *rk = rdk;
u32 s0, s1, s2, s3, t0, t1, t2, t3;
s0 = GETU32(in + threadIdx.x*(i) ) ^ rk[0];
}
This leads me to the following error message
error: calling a host function from a __ device_/_ global__ function is only allowed in device emulation mode
I have a example code where the programmer calls the macro exactly in that way. Can I call it in this way or is not possible at all?? If it is not, I will appreciate some hints in how it would be the best approach to rewrite the macros and assign the desired value to S0??
thank you very much in advance!!!