tags:

views:

92

answers:

5

say I have a kernel

foo(int a, int b)
{
    __shared__ int array[a];
}

it seems a has to be a constant value, I added const in front of int. It sill didn't work out, any idea?

foo(const int a, const int b)
{
    __shared__ int array[a];
}
+3  A: 

I don't think CUDA or OpenCL let you dynamically allocate shared memory. Use #define macro instead.

If you need a dynamic sized array on a per program basis, you can supply it using -D MYMACRO (with OpenCL, I don't know for CUDA). See Bahbar's answer.

Stringer Bell
+3  A: 

In ISO C++ the size of an array needs to be a so-called constant expression. This is stronger than a const-qualified variable. It basically means compile-time constant. So, the value has to be known at compile-time.

In ISO C90 this was also the case. C99 added VLAs, variable-length-arrays, that allow the size to be determined at runtime. The sizeof operator for these VLAs becomes a runtime operator.

I'm not familiar with CUDA or the __shared__ syntax. It's not clear to me why/how you use the term kernel. But I guess the rules are similar w.r.t. constant expressions and arrays.

sellibitze
+2  A: 

I suspect this is a C language question.

If it were C++, you could simply use std::vector.

void foo( int a, int b )
{
    std::vector<int> array( a );
    // ...
}

It if really is C++, then what C++ features you can use safely may depend on the environment. It's not clear what you mean by "kernel".

Alf P. Steinbach
+4  A: 

While you can't have a dynamically-sized array because of the constraints of the C language (as mentioned in other answers), what you can do in CUDA is something like this:

extern __shared__ float fshared[];

__global__ void testShmem( float * result, unsigned int shmemSize ) {
    // use fshared - shmemSize tells you how many bytes
    // Note that the following is not a sensible use of shared memory!
    for( int i = 0; i < shmemSize/sizeof(float); ++i ) {
       fshared[i] = 0;
    }
}

providing you tell CUDA how much shared memory you want during kernel invocation, like so:

testShmem<<<grid, block, 1024>>>( pdata, 1024 );
Edric
A: 

Here's how you can statically allocate a __shared__ array of n values in CUDA using C++ templates

template <int n>
kernel(...)
{
    __shared__ int array[n];
}

const int n = 128;
kernel<n><<<grid_size,block_size>>>(...);

Note that n must be a known constant at compile time for this to work. If n is not known at compile time then you must use the approach Edric suggests.

wnbell