tags:

views:

105

answers:

3

Is it somewhat possible to make a list, array, something in a device function with the size of the list/array beeing a parameter in the call… or a global variable that's initialized at call time?

I would like something like one of these list to work:

unsigned int size1;

__device__ void function(int size2) {

    int list1[size1];
    int list2[size2];
}

Is it possible to do something smart to make something like this work?

A: 

if you know what values of size you can expect, consider using C++ templates. Together with boost preprocessor you can easily generate multiple instances/entry points.

the other thing you can do is dynamically allocate shared memory and assign pointers manually. Obviously this may not work if you require thread private memory in excess of shared memory

I can provide you with a link if you would like to see example

aaa
Well.. I have a tree and need to do a kNN search in that. So for that I need a way to keep track on my position in the tree.. and for that I had thought of a array of size treeheight.And since I know the sice of the tree before I make the kernal call that handle kNN I sort of know the size. I can't see how templates will help me here.. but I would like to see an example of what you mean...
SenfMeister
+2  A: 

There is 1 way to allocate dynamic amount of shared memory - to use third launch kernel parameter:

__global__ void kernel (int * arr) 
{
    __shared__ int buf []; // size is not stated
    // copy data to shared mem:
    buf[threadIdx.x] = arr[blockIdx.x * blockDim.x + threadIdx.x];
    // . . . 
}
// . . . 
// launch kernel, set size of shared mem in bytes (k elements in buf):
kernel<<<grid, threads, k * sizeof(int)>>> (arr);

There is a hack for many arrays:

__device__ void function(int * a, int * b, int k) // k elements in first list
{
    __shared__ int list1 [];
    __shared__ int list2 []; // list2 points to the same point as list1 does

    list1 [threadIdx.x] = a[blockIdx.x * blockDim.x + threadIdx.x];
    list2 [k + threadIdx.x] = b[blockIdx.x * blockDim.x + threadIdx.x];
    // . . .
}

You must take into account: memory allocated to all block.

KoppeKTop
Well.. that's the thing with shared. It's for the block. I need a list that is only for a thread, and not a bunch of threads. What I have done is just to use a predefined size.
SenfMeister
A: 

Of course it is possible!

Take a look in the source-code of project: http://code.google.com/p/cuda-grayscale/

This function is called from main() and performs grayscale conversion on *gpu_image* based on it's width and height: cuda_grayscale(gpu_image, width, height, grid, block);

If you dig a little, you'll find the implementation in kernel_gpu.cu:

__global__ void grayscale(float4* imagem, int width, int height)
{
    const int i = blockIdx.x * (blockDim.x * blockDim.y) + blockDim.x * threadIdx.y + threadIdx.x;

    if (i < width * height)
    {
        float v = 0.3 * imagem[i].x + 0.6 * imagem[i].y + 0.1 * imagem[i].z;
        imagem[i] = make_float4(v, v, v, 0);
    }
}
karlphillip
karlphillip: Isn't that a pointer to a globally allocated memory?because what I need is to define a array within each kernel. what you link to take each kernel and perform one operation and enters the list at the int i position.
SenfMeister