tags:

views:

1256

answers:

1

I've been playing with the ATI OpenCL implementation in their Stream 2.0 beta. The OpenCL in the current beta only uses the CPU for now, the next version is supposed to support GPU kernels. I downloaded Stream because I have an ATI GPU in my work machine.

I write software that would benefit hugely from gains by using the GPU. However this software runs on customer machines, I don't have the luxury (as many scientific computing environments have) to choose the exact hardware to develop for, and optimize for that. So my question is, if I distribute the ATI OpenCL implementation with my application, will that mean it will never be able to use e.g. NVidia video cards? And if I use the NVidia OpenCL SDK, that it will never run optimally on AMD chips (considering the ATI/AMD link)?

On other words, who is in the end responsible for providing the OpenCL implementation? Will users be able to e.g. install an OpenCL 'driver' for their NVidia video card, alongside a 'driver' that gives them optimal performance on their AMD CPU?

As an aside, are there any good/active support forums for OpenCL apart from the Khronos message boards, or is that the place to go? I've seen that ATI has a board, and NVidia presumably has their own, where does the OpenCL user/developer community hang out? Has it already consolidated into one place?

+3  A: 

Ultimately, OpenCL will work the same way as OpenGL. That is, users will install the current drivers from their hardware vendors (ATI, NVIDIA, Intel). You as the developer will simply link against an OpenCL library when building your applications. When users run your application, the application will redirect to the appropriate vendor specific libraries provided by the drivers.

This is the way it will work, but it does not yet work in this way.

Another important thing to keep in mind is that you will still probably have to provide vendor specific code paths as code running on the CPU using OpenCL will probably use different optimized kernel parameters than code running on the GPU. The same is probably true for differences between GPU vendors.

Eric
The difference with OpenGL is that for OpenGL, the GPU vendor writes the drivers - period. OpenGL only works on the video card. But for OpenCL, ideally the CPU vendor writes the driver for CPU kernels and the GPU vendor writes the drivers for GPU kernels, as OpenCL kernels can run on CPU threads or GPU threads. Is this how it is supposed to work in the future?
Roel
OpenGL always supports a software path for when the hardware doesn't support certain operations. As such, OS vendors must supply a software OpenGL implementation (the MS Windows OpenGL is stuck at OpenGL 1.1). Something similar will probably happen with OpenCL. In any case, AMD/ATI will likely release a version of OpenCL that will support both their CPUs and their GPUs. Similarly, Intel will likely release an OpenCL that supports the regular CPUs and the Larrabee GPUs. I don't know enough about Apple's OpenCL implementation to know what it supports.
Eric
Ok, so can I conclude from that if a customer has an ATI video card and an Intel CPU, they won't have optimal performance? That, depending on what OpenCL driver/implementation they have installed, they will either run kernels on the CPU or the GPU? I mean I know that it will probably *run* on the machine, that's not my concern; my concern is, will it run *fast* (so using all hardware on the machine, all CPU cores and all GPU 'cores').
Roel
The short answer is it's too early to tell, especially in cross-vendor scenarios. Also, there can be orders of magnitude difference between between using all hardware and using all hardware optimally. Catering to the memory architecture and optimal work group size and on the different platforms will be of critical importance to get the maximum performance out of your application. Even if you target only AMD CPUs and GPUs, you will probably need to tune your kernel parameters for each to get the best performance.
Eric
Also, I think you're prematurely optimizing now. OpenCL is "the way of the future" if you want cross-platform, high-performance computing. Focus on learning the details now and optimizing for your current platform. Then later, you can worry about multiple vendors/platforms.
Eric
Well I don't agree with you there. If I need to wait another year or two before drivers are in Windows (only platform I care about), and in the mean time I cannot count on having performance on all (or most) CPU's and GPU's, I'm better off developing directly for CUDA and tell my customers 'it'll only run fast with an nVidia card'. CUDA is more mature than OpenCL, better tools, optimized for one hardware platform, the only reason to choose OpenCL *now* is for cross-vendor hardware support. If that is flakey, most of the benefit of OpenCL (for me) is gone. Anyway, thanks for your insights.
Roel
I think you're missing the point I'm trying to make. Even when OpenCL is cross-platform, you'll still have to do a lot of custom development. It's best to start that development now on some platform that it can run optimally on in the mean time. If you do choose CUDA, which is a good platform, and you plan to switch to support ATI cards later, then I recommend using the low level, "Driver API" since it is closer to the OpenCL API.
Eric
I think we're arguing different cases here. I realize that, to get /optimal/ performance from OpenCL, I would need to tweak for different kernels - gpu, cpu, specific vendors, ... The reason for me to first consider OpenCL (over CUDA or Stream or other vendor-specific APIs) is because it would allow me to write parallelized code that would run on all GPUs and all CPU cores - i.e. maximum parallelism. (I agree that my earlier comment of 'optimal performance' was an overstatement, what I meant was 'basic parallelism as that will give my specific usecase a significant speed boost').
Roel
(stupid 600 char limit)But, if OpenCL will not give me even basic parallelism in the cases where customers have gpu/cpu combinations from different vendors, then I may be better off giving up on a universal approach and go for 'one vendor' (e.g. nvidia) support. If I do that, I could at least really optimize for that platform (if I choose OpenCL, I won't be writing custom kernels for each GPU generation - I would have to settle for one 'good enough' approach, because of time/budget constraints).
Roel
In the end, this is an optimization problem, and most of the variables I can't even measure (how many of my customers will have mixed vendor CPU/GPU combos? How will OpenCL implementation quality turn out? How much time would I spend on optimizing my code for CUDA chipsets? etc.). I'm just saying that the question 'OpenCL/CUDA/Stream' isn't the clear cut choice it would have been if OpenCL, in the future, would be able to make optimal use of all hardware, regardless of vendor.
Roel
You were better off posting an answer than a comment.
whatnick