I have a CUDA application that on one computer (with a GTX 275) works fine and on another, with a GeForce 8400 works about 100 times slower. My suspicion is that there is some kind of fallback that makes the code actually run on the CPU rather than on the GPU.
Is there a way to actually make sure that the code is running on the GPU?
Is this fallback documented somewhere?
What conditions may trigger it?
EDIT: The code is compiled with compute capabilities 1.1 which what the 8400 has.