ansaurus

Question

define variable size on array in local memory, using CUDA

Answer 1

A:

if you know what values of size you can expect, consider using C++ templates. Together with boost preprocessor you can easily generate multiple instances/entry points.

the other thing you can do is dynamically allocate shared memory and assign pointers manually. Obviously this may not work if you require thread private memory in excess of shared memory

I can provide you with a link if you would like to see example

aaa 2010-07-10 20:17:29

Well.. I have a tree and need to do a kNN search in that. So for that I need a way to keep track on my position in the tree.. and for that I had thought of a array of size treeheight.And since I know the sice of the tree before I make the kernal call that handle kNN I sort of know the size. I can't see how templates will help me here.. but I would like to see an example of what you mean...

SenfMeister 2010-07-10 23:02:44

Answer 2

+2 A:

There is 1 way to allocate dynamic amount of shared memory - to use third launch kernel parameter:

__global__ void kernel (int * arr) 
{
    __shared__ int buf []; // size is not stated
    // copy data to shared mem:
    buf[threadIdx.x] = arr[blockIdx.x * blockDim.x + threadIdx.x];
    // . . . 
}
// . . . 
// launch kernel, set size of shared mem in bytes (k elements in buf):
kernel<<<grid, threads, k * sizeof(int)>>> (arr);

There is a hack for many arrays:

__device__ void function(int * a, int * b, int k) // k elements in first list
{
    __shared__ int list1 [];
    __shared__ int list2 []; // list2 points to the same point as list1 does

    list1 [threadIdx.x] = a[blockIdx.x * blockDim.x + threadIdx.x];
    list2 [k + threadIdx.x] = b[blockIdx.x * blockDim.x + threadIdx.x];
    // . . .
}

You must take into account: memory allocated to all block.

KoppeKTop 2010-07-11 14:07:57

Well.. that's the thing with shared. It's for the block. I need a list that is only for a thread, and not a bunch of threads. What I have done is just to use a predefined size.

SenfMeister 2010-08-02 07:43:02

Answer 3

A:

Of course it is possible!

Take a look in the source-code of project: http://code.google.com/p/cuda-grayscale/

This function is called from main() and performs grayscale conversion on *gpu_image* based on it's width and height: cuda_grayscale(gpu_image, width, height, grid, block);

If you dig a little, you'll find the implementation in kernel_gpu.cu:

__global__ void grayscale(float4* imagem, int width, int height)
{
    const int i = blockIdx.x * (blockDim.x * blockDim.y) + blockDim.x * threadIdx.y + threadIdx.x;

    if (i < width * height)
    {
        float v = 0.3 * imagem[i].x + 0.6 * imagem[i].y + 0.1 * imagem[i].z;
        imagem[i] = make_float4(v, v, v, 0);
    }
}

karlphillip 2010-07-20 18:26:16

karlphillip: Isn't that a pointer to a globally allocated memory?because what I need is to define a array within each kernel. what you link to take each kernel and perform one operation and enters the list at the int i position.

SenfMeister 2010-07-21 09:28:51

ansaurus

tags:

views:

answers:

define variable size on array in local memory, using CUDA

related questions