tags:

views:

47

answers:

1

I want to use assembly code in CUDA C code in order to reduce expensive executions as we do using asm in c programming. I've googled for that but nothing has been found.

Is it possible?

+2  A: 

No, you can't, there is nothing like the asm constructs from C/C++. What you can do is tweak the generated PTX assembly and then use it with CUDA.

See this for an example.

But for GPUs, assembly optimizations are NOT necessary, you should do other optimizations first, such as memory coalescency and occupancy. See the CUDA Best Practices guide for more information.

Matias Valdenegro
Second that! In my experience, CUDA programs are almost always memory bound, not compute bound.
mch
thanks above both. I just wanted to reduce the number of division and modulo operations, but now I will focus on the memory issue.
superscalar
Note, if you're compiling against the newest architecture (using the flag -arch sm_20), the newest API is now fully?? compliant with IEEE floating point specifications for division and square root. If you've got a bunch of divisions and you're also using -arch sm_20, then you might consider switching back to the "less" compliant version for a performance gain using the flag: __-prec-div=false__ http://forums.nvidia.com/lofiversion/index.php?t170749.html
M. Tibbits