I have an existing MFC application with matrix computation with CPU-optimized BLAS libraries. I'm interested in adding CuBLAS computational functionalities to my project, but I have the two following questions:
1) I'm not sure if I would need to do something on specifying my own CUDA kernel, thread, and block configurations at this point. If so, which sections on the architecture would you recommend paying the most attention to when modifying the algorithm?
2) I'm interested in either (a) creating a new project in Visual Studio with CuBLAS features in the program, or (b) integrate CuBLAS capabilities in an existing MFC project. However, I'm having trouble configuring the Visual Studio project to work with CUDA SDK properly other than following a guide like this, which may not work if I'm trying to integrate this with an existing project. What would be your recommendations on this?
Thanks in advance for the comments.