views:

9

answers:

1

The vector addition example has this code:

// Asynchronous write of data to GPU device
ciErr1 = clEnqueueWriteBuffer(cqCommandQueue, cmDevSrcA, CL_FALSE, 0, sizeof(cl_float) * szGlobalWorkSize, srcA, 0, NULL, NULL);
ciErr1 |= clEnqueueWriteBuffer(cqCommandQueue, cmDevSrcB, CL_FALSE, 0, sizeof(cl_float) * szGlobalWorkSize, srcB, 0, NULL, NULL);
shrLog("clEnqueueWriteBuffer (SrcA and SrcB)...\n"); 
if (ciErr1 != CL_SUCCESS)
{
    shrLog("Error in clEnqueueWriteBuffer, Line %u in file %s !!!\n\n", __LINE__, __FILE__);
    Cleanup(EXIT_FAILURE);
}

// Launch kernel
ciErr1 = clEnqueueNDRangeKernel(cqCommandQueue, ckKernel, 1, NULL, &szGlobalWorkSize, &szLocalWorkSize, 0, NULL, NULL);
shrLog("clEnqueueNDRangeKernel (VectorAdd)...\n"); 
if (ciErr1 != CL_SUCCESS)

It launches the kernel right afterwards. How does this not cause problems? We aren't guaranteeing that the graphics memory buffers have been fully written to when the kernel launches right?

+1  A: 

I while the writes are asynchroneous from a hosts point of view, they aren't necessarily asynchroneous from thee devices point of view. I'd assume that the commandqueue is created without CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, so it's an in-order commandqueue.

The opemcl specification says the following about in-order execution:

In-order Execution: Commands are launched in the order they appear in the command- queue and complete in order. In other words, a prior command on the queue completes before the following command begins. This serializes the execution order of commands in a queue.

Therefore the writes should complete before the kernel is executed on the device.

Grizzly