ansaurus

Question

Strange behaviour using local memory in OpenCL

Answer 1

A:

Firstly, and importantly, you need to be careful that itemcount is a multiple of the local work size to avoid divergence when executing the barrier.

All work-items in a work-group executing the kernel on a processor must execute this function before any are allowed to continue execution beyond the barrier. This function must be encountered by all work-items in a work-group executing the kernel.

You could implement this as follows:

unsigned int itemcountrounded = get_local_size(0) * ((itemcount + get_local_size(0) - 1) / get_local_size(0));
for(unsigned int id = get_global_id(0); id < itemcountrounded; id += globalsize, groupid += groupcount)
{
    // ...
    if (id < itemcount)
        result[id]   = (float) offset;
}

You said the code was reduced for simplicity, what happens if you run what you posted? Just wondering whether you need to put the barrier on global memory as well.

Tom 2010-02-01 20:49:21

What I meant by reduced for simplicity was, that what I'm really trying isn't to dublicate the groupid into every vektor entry. What I've posted as results was the outcome of running the psted kernel (at least of one run, the incorrect entries seem to vary from run to run).I've already ensured that itemcount is a multiple of the localworksize, however from my tests it doesn't matter either way (as in the behaviour is basically same whether or not itemcount is divisible by the local work size)

Grizzly 2010-02-02 16:52:00

Have you tried putting the barrier on global memory too? i.e. `barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE)`

Tom 2010-02-02 21:15:06

ansaurus

tags:

views:

answers:

Strange behaviour using local memory in OpenCL

related questions