What would happen if there are four concurrent CUDA Applications competing for resources in one single GPU so they can offload the work to the graphic card?. The Cuda Programming Guide 3.1 mentions that there are certain methods which are asynchronous:
- Kernel launches
- Device device memory copies
- Host device memory copies of a memory block of 64 KB or less
- Memory copies performed by functions that are suffixed with Async
- Memory set function calls
As well it mentions that devices with compute capability 2.0 are able to execute multiple kernels concurrently as long as the kernels belong to the same context.
Does this type of concurrency just apply to streams within a single cuda applications but not possible when there are complete different applications requesting GPU resources??
Does that mean that the concurrent support is just available within 1 application (context???) and that the 4 applications will just run concurrent in the way that the methods might be overlaped by context switching in the CPU but the 4 applications need to wait until the GPU is freed by the other applications? (i.e Kernel launch from app4 waits until a kernel launch from app1 finishes..)
If that is the case, how these 4 applications might access GPU resources without suffering long waiting times?