tags:

views:

1069

answers:

1

I want to use constant memory which will be accessed by all threads across all of my kernels.

The declaration is something like this

extern constant float smooth [8 * 1024];

I am copying data to this variable using

cudaMemcpyToSymbol("smooth", smooth_local, smooth_size, 0, cudaMemcpyHostToDevice);

smooth_size = 7K bytes

It was giving me incorrect output

but when I run it in -deviceemu mode and tried to print the contents of both these variables inside the kernel, I was getting all zeroes for smooth and smooth_local was correct.

I tried printing the output just after cudaMemcpyToSymbol still it was giving me 0's.

Can you anyone throw light on my problem?

+2  A: 

To declare CUDA constant memory, it would look like this:

__constant__ float smooth[8 * 1024];

Note that CUDA constant memory is local to its translation unit (i.e. it is implicitly declared static). This is one of the annoying limitations of CUDA so if you need to share these values between separete .cpp/.cu files, you will have to redeclare the memory in each .cpp/.cu file it is needed in. You will also have to call cudaMemCopyToSymbol again. And finally, you are limited to a total of 64k of constant memory throughout your entire CUDA program.

Eric