tags:

views:

80

answers:

2

Is there a method to share the GPU between two separate OpenCL capable programs, or more specifically between two separate processes that simultaneously both require the GPU to execute OpenCL kernels? If so, how is this done?

+2  A: 

It depends what you call sharing.

In general, you can create 2 processes that both create an OpenCL device, on the same GPU. It's then the driver/OS/GPU's responsibility to make sure things just work.

That said, most implementations will time-slice the GPU execution to make that happen (just like it happens for graphics).

I sense this is not exactly what you're after though. Can you expand your question with a use case ?

Bahbar
The use case would be a single process using OpenCL to utilize the GPU for a lengthy period of time. If another OpenCL process that also requires the GPU is launched, how is this handled? Your answer suggests that the driver will time slice, i.e. allocate the whole GPU to one process at any given time. Surely this is inefficient, given that data associated with each process must be copied backwards and forwards between the device and the host?
Chris
@Chris: the short answer is that not all the data needs to be copied. the execution is time-sliced, but the memory can stay shared, if both fit on the card (memory is "virtualized". See the URL below). If it does _not_ fit, then it will copy data around for sure. The granularity of the time-slicing is coarse too. Assuming we're talking windows, in WDDM 1.1, a CL kernel run never gets interrupted, and more. See http://en.wikipedia.org/wiki/Windows_Display_Driver_Model for more info.
Bahbar
In particular: http://download.microsoft.com/download/5/b/9/5b97017b-e28a-4bae-ba48-174cf47d23cd/PRI103_WH06.ppt
Bahbar
The ppt was really interesting. How will more fine grained time slicing be implemented under Linux?
Chris
@Chris: No idea. What is sure though, is that the GPU hardware, at least up to the current generation, does not really support arbitrary time slicing yet (Fermi might be an exception, not sure). So the OS can't provide what the hardware does not support.
Bahbar
Ok, thanks for the discussion. It seems Fermi has support for better context switching and concurrent kernel execution: http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf
Chris
+1  A: 

Current GPUs (except NVidia's Fermi) do not support simultaneous execution of more than one kernel. Moreover, to this date GPUs do not support preemptive multitasking; it's completely cooperative! A kernel's execution cannot be suspended and continued later on. So the granularity of any time-based GPU sharing depends on the kernels' execution times.

If you have multiple programs running that require GPU access, you should therefore make sure that your kernels have short runtimes (< 100ms is a rule of thumb), so that GPU time can be timesliced among the kernels that want GPU cycles. It's also important to do that since otherwise the host system's graphics will become very unresponsive as they need GPU access too. This can go as far that a kernel in an endless or long loop will apparently crash the system.

dietr