tags:

views:

209

answers:

3

I have a CUDA application that on one computer (with a GTX 275) works fine and on another, with a GeForce 8400 works about 100 times slower. My suspicion is that there is some kind of fallback that makes the code actually run on the CPU rather than on the GPU.

Is there a way to actually make sure that the code is running on the GPU?
Is this fallback documented somewhere?
What conditions may trigger it?

EDIT: The code is compiled with compute capabilities 1.1 which what the 8400 has.

+1  A: 

If I remember correctly, you can list all available devices (and choose which device to use for your kernel) from the host code. You could try determine if the available device is software emulation and issue a warning.

Victor Nicollet
+5  A: 

Couldn't it just be that the gap in performance is that large. This link indicates that the 8400 operates at 22-62 GFlops and this link indicates that the GTX 275 operates at 1010.88 GFlops.

Andreas Brinck
wow.... that's _some_ gap
Javier
+2  A: 

There are a number of possible reasons for this.

  1. Presumably you're not using the emulation device. Can you run the device query sample from the SDK? That will show if you have the toolkit and driver installed correctly.

    You can also query the device properties from within your app to check what device you are attached to.

  2. The 8400 is much lower performance than the GTX275, so it could be real, but see the next point.

  3. One of the major changes in going from compute capability 1.1 to 1.2 and beyond is the way the memory accesses are handled. In 1.1 you have to be very careful not only to coalesce your memory accesses but also to make sure that each half-warp is aligned, otherwise each thread will issue it's own 32 byte transaction. In 1.2 and beyond the alignment is not such an issue as it degrades gracefully to minimise transactions.

    This, combined with the lower performance of the 8400, could also account for what you are seeing.

Tom