say I have a kernel
foo(int a, int b)
{
__shared__ int array[a];
}
it seems a has to be a constant value, I added const in front of int. It sill didn't work out, any idea?
foo(const int a, const int b)
{
__shared__ int array[a];
}
say I have a kernel
foo(int a, int b)
{
__shared__ int array[a];
}
it seems a has to be a constant value, I added const in front of int. It sill didn't work out, any idea?
foo(const int a, const int b)
{
__shared__ int array[a];
}
I don't think CUDA or OpenCL let you dynamically allocate shared memory. Use #define
macro instead.
If you need a dynamic sized array on a per program basis, you can supply it using -D MYMACRO
(with OpenCL, I don't know for CUDA). See Bahbar's answer.
In ISO C++ the size of an array needs to be a so-called constant expression. This is stronger than a const-qualified variable. It basically means compile-time constant. So, the value has to be known at compile-time.
In ISO C90 this was also the case. C99 added VLAs, variable-length-arrays, that allow the size to be determined at runtime. The sizeof operator for these VLAs becomes a runtime operator.
I'm not familiar with CUDA or the __shared__
syntax. It's not clear to me why/how you use the term kernel. But I guess the rules are similar w.r.t. constant expressions and arrays.
I suspect this is a C language question.
If it were C++, you could simply use std::vector
.
void foo( int a, int b )
{
std::vector<int> array( a );
// ...
}
It if really is C++, then what C++ features you can use safely may depend on the environment. It's not clear what you mean by "kernel".
While you can't have a dynamically-sized array because of the constraints of the C language (as mentioned in other answers), what you can do in CUDA is something like this:
extern __shared__ float fshared[];
__global__ void testShmem( float * result, unsigned int shmemSize ) {
// use fshared - shmemSize tells you how many bytes
// Note that the following is not a sensible use of shared memory!
for( int i = 0; i < shmemSize/sizeof(float); ++i ) {
fshared[i] = 0;
}
}
providing you tell CUDA how much shared memory you want during kernel invocation, like so:
testShmem<<<grid, block, 1024>>>( pdata, 1024 );
Here's how you can statically allocate a __shared__
array of n
values in CUDA using C++ templates
template <int n>
kernel(...)
{
__shared__ int array[n];
}
const int n = 128;
kernel<n><<<grid_size,block_size>>>(...);
Note that n
must be a known constant at compile time for this to work. If n
is not known at compile time then you must use the approach Edric suggests.