tags:

views:

89

answers:

1

I am calling cudaMemcpy and the copy returns successfully however the source values are not being copied to the destination. I wrote a similar piece using memcpy() and that works fine. What am I missing here?

// host externs
extern unsigned char landmask[DIMX * DIMY];

// use device constant memory for landmask
unsigned char *tempmask;
__device__ unsigned char *landmask_d;

..

void checkCUDAError(const char* msg) {
cudaError_t err = cudaGetLastError();
  if (cudaSuccess != err) {
    fprintf(stderr, "Cuda error: %s: %s.\n", msg, cudaGetErrorString(err));
    exit(EXIT_FAILURE);
  }
}

..

// try a memcpy
size_t landMemSize_t landMemSize = DIMX * DIMY * sizeof(char);
tempmask = (unsigned char*)malloc(landMemSize);
memcpy(tempmask, landmask, DIMX * DIMY);
if (landmask[0] != *tempmask) printf("FAILURE!\n");

// allocate device memory for landmask
cudaMalloc((void **)&landmask_d, landMemSize);
printf("allocating %ld Kbytes for landmask\n", landMemSize/1024);
checkCUDAError("memory allocation");

// copy landmask from host to device
cudaMemcpy(landmask_d, landmask, landMemSize, cudaMemcpyHostToDevice);
checkCUDAError("memory copy");
if (landmask[0] != *landmask_d) printf("FAILURE!\n");

None of the CUDAErrors report any problem yet the second FAILURE is called..?

+3  A: 
 if (landmask[0] != *landmask_d) printf("FAILURE!\n");

you are comparing values in different memory domains. You should first copy memory from device to cpu and then compare

aaa
Removing __device__, rebuilding and then running in cuda-gdb, stepping thru the cudaMemcpy() routine, the debugger still shows different values.preprocess () at process.cu:8888 cudaMemcpy(landmask_d, landmask, landMemSize, cudaMemcpyHostToDevice);(cuda-gdb) s89 checkCUDAError("memory copy");(cuda-gdb) p *landmask_d$4 = 0 '\0'(cuda-gdb) p landmask[0]$5 = 1 '\001'
timbo
Yes, it seems the only way to access device memory is from within a kernel. I have validated my cudaMemcpy's by simply doing a cpy, altering the source (in host memory) and then doing a copy back and verifying the source has been restored to it's original value.Unfortunately this also holds true in cuda-gdb and as threads can only be accessed on a per-warp basis, looking at individual register values can be challenging at times.
timbo
@timbo I feel your pain. Cuda is exceptionally difficult to debug
aaa