+1  A: 

Ben,

I may not be understanding you, but I think what you are asking is impossible.

Disclaimer: I have little experience with CUDA; I've been doing GPGPU programming with AMD/ATI's stream SDK for their Radeon gpus instead.

When you have the GPU execute a kernel, whether it's CUDA or CAL or whatever, the gpu has it's own memory and it's own physical address space. It can't access any of the memory in your program's virtual memory back on the cpu. The only data it can access are the buffers that you've created with cudaMalloc (or some other cuda API) and expicitly or implicitly copied it from your program's memory, over the PCIexpress bus into the video card's memory, usind cudaMemcpy (or similiar).

So you can put an abitrary pointer into a data buffer and give that data buffer to the gpu, but the pointer is an address in you're program's virtual memory address space. The gpu is not executing in that address space; it's executing in it's own memory space; so it can't dereference the pointer to get to the data.

Even if your system is "Unified Memory Architecture", where there is a cheap video chip which does not have it's own memory -- like on many laptops -- and the gpu uses a portion of the host cpu's memory, it still can't work. The pointer is in the virtual address space of your process, and it isn't a physical memory address.

For that reason, I think it is impossible to compile a gpu kernel that would dereference a pointer type.

Die in Sente
A: 

Why don't you just use the so-called "packed" data representation? This approach allows you to place all the data you need into one-dimension byte array. E.g., if you need to store
struct data
{
  int nFiles;
  int nNames;
  int* files;
  int* names;
}
You can just store this data in the array this way:
[struct data (7*4=28 bytes)
  [int nFiles=3 (4 bytes)]
  [int nNames=2 (4 bytes)]
  [file0 (4 bytes)]
  [file1 (4 bytes)]
  [file2 (4 bytes)]
  [name0 (4 bytes)]
  [name1 (4 bytes)]
]

A: 

I think constant memory is 64K and you cannot allocate it dynamically using CudaMalloc. It has to be declared constant say, device constant data mydata [100]; similarly you also don't need to free it. Also you shouldn't pass the reference to it via pointer, just access it as a global variable. I tried doing similar thing and it gave me segfault (in devicemu).