I'm wondering about Nvidia's CUBLAS Library. Does anybody have experience with it? For example if I write a C program using BLAS will I be able to replace the calls to BLAS with calls to CUBLAS? Or even better implement a mechanism which let's the user choose at runtime?
What about if I use the BLAS Library provided by Boost with C++?