Cuda program results are always zero in HW, correct in EMU??

views:

answers:

Cuda program results are always zero in HW, correct in EMU??

Hi all!

I am having a weird problem .. I have written a CUDA code which executes correctly in emulation and all results show up.. however, when executed on hardware "G210" .. the results in the result memory are always 0

I am passing two vectors to the kernel, one with random variables the other is initialized to zero, the code copies the first vector to shared memory, does some swapping and other operations and then writes back the results on the second vector (the one with the initial 0's)

I am using double precision, the -arch sm13 flag is used, all memory allocation also use sizeof(double) ..

I have checked if the kernel is invoked, it does .. so no problems here .. the cudaMemCpy has no problems ..

what could be the problem .. :( why would it work in emulation but not on HW

I am quite confused .. any ideas?

+1 A:

Emulation mode is not an accurate simulation of the GPU - it doesn't attempt to simulate the behaviour of concurrent threads and all the problems that can arise from this. In order to debug your kernel you're probably going to have to break it down into smaller versions until you can identify the problem.

Paul R 2010-05-21 05:55:08

+1 A:

Emulation uses a different compiler, all memory is on the host, only one thread running, etc. If you find a bug in emulation, you found a bug in your code. If it works in emulation, it doesn't mean you don't have bugs. Your question is basically, I have buggy code, what is my bug?

Advice: check return values for everything. Learn how to do error checking. Realize that errors on the device can show up asynchronously with respect to cpu code. Use the debugger (maybe buy a cheap low end nvidia gpu to make this easier). Give cuprintf a try if you prefer printf to debugging (available on cuda forums). Ask for help on the nvidia cuda forums.

Nathan Whitehead 2010-05-21 06:19:16

+2 A:

If I remember correctly, GeForce 210 does not support Compute Capability 1.3, i.e. it does not support doubles.

Try rewriting your code to use singles, compile with -arch=sm_12.

Tom 2010-05-21 09:18:12

You are absolutely right! My mistake .. I have just done so and it worked fine with correct results .. Thanks loads .. however, there are no speed up gains .. on the contrary, the GPU is slower :(

Orion Nebula 2010-05-21 12:31:47

ansaurus

tags:

views:

answers:

Cuda program results are always zero in HW, correct in EMU??

related questions