ansaurus

Question

Answer 1

+1 A:

I just started looking into cuda, but I literally just read this out of a book. It sounds like it directly applies to you.

"You can pass pointers allocated with cudaMalloc() to functions that execute on the device.(kernals, right?)

You can use pointers allocated with cudaMalloc() to read or write memory from code that executes on the device .(kernals again)

You can pass pointers allocated with cudaMalloc to functions that execute on the host. (regular C code)

You CANNOT use pointers allocated with cudaMalloc() to read or write memory from code that executes on the host."

^^ from "Cuda by Example" by Jason Sanders and Edward Kandrot published by Addison-Wesley yadda yadda no plagiarism here.

Since you are dereferencing inside the kernal, maybe the opposite of the last rule is also true. i.e. you cannot use pointers allocated by the host to read or write memory from code that executes on the device.

Edit: I also just noticed a function called cudaMemcpy

Looks like you would need to declare the 3000 int array twice in host code. One by calling malloc, the other by calling cudaMalloc. Pass the cuda one to the kernal as well as the input array to be sorted. Then after calling the kernal function:

cudaMemcpy(malloced_array, cudaMallocedArray, 3000*sizeof(int), cudaMemcpyDeviceToHost)

I literally just started looking into this like I said though so maybe theres a better solution.

Tom 2010-08-19 23:37:06

Hi Tom, you should check out Thrust (http://code.google.com/p/thrust/) since if it is applicable in your project then it can be a great timesaver (both now and for maintenance).

Tom 2010-08-23 11:38:24

Answer 2

A:

CUDA code can use pointers in exactly the same manner as host code (e.g. dereference with * or [], normal pointer arithmetic and so on). However it is important to consider the location being accessed (i.e. the location to which the pointer points) must be visible to the GPU.

If you allocate host memory, using malloc() or std::vector for example, then that memory will not be visible to the GPU, it is host memory not device memory. To allocate device memory you should use cudaMalloc() - pointers to memory allocated using cudaMalloc() can be freely accessed from the device but not from the host.

To copy data between the two, use cudaMemcpy().

When you get more advanced the lines can be blurred a little, using "mapped memory" it is possible to allow the GPU to access parts of host memory but this must be handled in a particular way, see the CUDA Programming Guide for more information.

I'd strongly suggest you look at the CUDA SDK samples to see how all this works. Start with the vectorAdd sample perhaps, and any that are specific to your domain of expertise. Matrix multiplication and transpose are probably easy to digest too.

All the documentation, the toolkit and the code samples (SDK) are available on the CUDA developer web site.

Tom 2010-08-20 14:42:40

ansaurus

tags:

views:

answers:

CUDA Pointer Dereferencing Issue

related questions