tags:

views:

87

answers:

1

i have written a CUDA code to solve an NP-Complete problem, but the performance was not as i suspected.

i know about "some" optimization techniques (using shared memroy,textures,zerocopy...)

What are the most important optimization techniques Cuda programmers should know about?

+2  A: 

You should read NVIDIA's CUDA Programming Best Practices guide: http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide.pdf

This has multiple different performance tips with associated "priorities". Here are some of the top priority tips:

  1. Use the effective bandwidth of your device to work out what the upper bound on performance ought to be for your kernel
  2. Minimize memory transfers between host and device - even if that means doing calculations on the device which are not efficient there
  3. Coalesce all memory accesses
  4. Prefer shared memory access to global memory access
  5. Avoid code execution branching within a single warp as this serializes the threads
Edric
6. Avoid bank conflicts.PSIn my application, i have found out, that usage of statically allocated shared memory is faster, than usage of dynamically allocated memory (with kernels<<<blocks, threads, sharedMemSize>>>())All this is described in best practices guide.
LonliLokli