views:

660

answers:

3

I am having trouble using the *__constant* qualifier in my OpenCL kernels. My platform is Snow Leopard.

I have tried initializing a CL read-only memory object on the GPU, copying my constant array from host into it. Then I set the kernel argument just as with *__global* memory arguments, but this does not work as it should but I see no error or warnings. I have also tried using the data directly in the clSetKernelArg function as with float and int types, it works neither.

Do I make any mistakes or is there something wrong with Apple's implementation? I would like to see any working examples how this is done, both OpenCL (gpu) and host code.

+3  A: 

I doubt there is something so fundamental wrong with Apple's implementation. I used the following OpenCL Hello World Example application to get my head around the basics.

In this example I replaced the __global float* input with __constant float* input and it worked fine. You also need to make sure your buffer is CL_MEM_READ_ONLY, using something like clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(float) * count, NULL, NULL).

From reading the spec, I think __constant => __global + CL_MEM_READ_ONLY.

I'm running Snow Leopard on MBP 15".

alexr
+2  A: 

There are some bugs with the way Apple's OpenCL compiler handles __constant variables on the GPU. If the compiler log says something like

OpenCL Build Error : Compiler build log:
Error while compiling the ptx module: CLH_ERROR_NO_BINARY_FOR_GPU
PTX Info log: 
PTX Error log:

then I had the same error as you, and filed a bug on it. The folks at Apple marked it as a duplicate (of rdar://7217974 apparently) so I assume it's a known problem and they're working on it.

ianh
+1  A: 

"From reading the spec, I think __constant => __global + CL_MEM_READ_ONLY."

Not really, when you specify _constant instead of __global, you are saying to your device to save this data in a different portion of memory. In some devices, its true that can be the same, but others couldn't be. On NVIDIA cards, for instance, you've only 64kb of _constant memory and loads of mb for __global. The advantage on __constants is that in NVIDIA devices, it is cached:)

You can query your device: (example of my device query)

CL_DEVICE_MAX_MEM_ALLOC_SIZE: 128 MByte

CL_DEVICE_GLOBAL_MEM_SIZE: 255 MByte

CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte

CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte

Vando