views:

67

answers:

2

Hello: Does Global Work Size (Dimensions) Need to be Multiple of Work Group Size (Dimensions) in OpenCL?

If so, is there a standard way of handling matrices not a multiple of the work group dimensions? I can think of two possibilities:

Dynamically set the size of the work group dimensions to a factor of the global work dimensions. (this would incur the overhead of finding a factor and possibly set the work group to a non-optimal size.)

Increase the dimensions of the global work to be the nearest multiple of the work group dimensions, keeping all input and output buffers the same but checking bounds in the kernel to avoid segfaulting, i.e. do nothing on the work items out of bound of the desired output. (This seems like the better way.)

Would the second way work? Is there a better way? (Or is it not necessary because work group dimensions need not divide global work dimensions?)

Thanks!

+1  A: 

According to the standard it doesn't have to be from what I saw. I think I would handle it with a branch, but I don't know exactly what kind of matrix operation you are doing.

http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf#page=131

global_work_size points to an array of work_dim unsigned values that describe the number of global work-items in work_dim dimensions that will execute the kernel function. The total number of global work-items is computed as global_work_size[0] * ... * global_work_size[work_dim – 1].

The values specified in global_work_size + corresponding values specified in global_work_offset cannot exceed the range given by the sizeof(size_t) for the device on which the kernel execution will be enqueued. The sizeof(size_t) for a device can be determined using CL_DEVICE_ADDRESS_BITS in table 4.3. If, for example, CL_DEVICE_ADDRESS_BITS = 32, i.e. the device uses a 32-bit address space, size_t is a 32-bit unsigned integer and global_work_size values must be in the range 1 .. 2^32 - 1. Values outside this range return a CL_OUT_OF_RESOURCES error.

Chad Brewbaker
+2  A: 

Thx for the link Chad. But actually, if you read on:

If local_work_size is specified, the values specified in global_work_size[0], … global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0], … local_work_size[work_dim – 1].

So YES, the local work size must be a multiple of the global work size.

I also think the assigning the global work size to the nearest multiple and being careful about bounds should work, I'll post a comment when I get around to trying it.

Junier