ansaurus

Question

Answer 1

+1 A:

You can do the first example, I haven't tried the second.

However, if you can help it, you might want to redesign your program not to do this. You do not want to allocate 4000 bytes of memory in your kernel. That will lead to a lot of use of CUDA local memory, since you will not be able to fit everything into registers. CUDA local memory is slow (400 cycles of memory latency).

tkerwin 2010-02-02 22:03:08

Answer 2

A:

you can allocate shared memory dinamically when you launch the kernel.

global void compute(long *c1, long size, ...) { ... extern shared float shared[]; ... }

compute <<< dimGrid, dimBlock, sharedMemSize >>>( blah blah );

CUDA programming guide: "the size of the array is determined at launch time (see Section 4.2.3)."

crick3r 2010-02-08 23:57:10

Answer 3

A:

You can do #1, but beware this will be done in EVERY thread!

Your second snippet won't work, because dynamic memory allocation at kernel runtime is not supported.

macs 2010-02-10 17:04:37

ansaurus

tags:

views:

answers:

creating arrays in nvidia cuda kernel

related questions