views:

823

answers:

3

Hi, I have some trouble with allocate array of arrays in CUDA.

void ** data;
cudaMalloc(&data, sizeof(void**)*N); // allocates without problems
for(int i = 0; i < N; i++) {
    cudaMalloc(data + i, getSize(i) * sizeof(void*)); // seg fault is thrown
}

What did I wrong?

+4  A: 

I don't believe this is supported. cudaMalloc() allocates device memory, but stores the address in a variable on the host. In your for-loop, you are passing it addresses in device memory.

Depending on what you're trying to accomplish, you may want to allocate data with normal host malloc() before calling the for-loop as you currently have it. Or allocate a single big block of device memory and compute offsets into it manually.

Look at Sections 2.4, 3.2.1 and B.2.5 (bottom) of the CUDA Programming Guide for more discussion of this. Specifically, on the bottom of page 108:

The address obtained by taking the address of a __device__, __shared__ or __constant__ variable can only be used in device code.

Gabriel
This is correct. The pointer allocated by cudaMalloc must reside in host memory and you are trying to store these pointers in device memory. Instead you should create an array of pointers on the host and then copy it to the device at the end, or just compute offsets as Gabriel suggests.
Tom
+2  A: 

You have to allocate the pointers to a host memory, then allocate device memory for each array and store it's pointer in the host memory. Then allocate the memory for storing the pointers into the device and then copy the host memory to the device memory. One example is worth 1000 words:

__global__ void multi_array_kernel( int N, void** arrays ){
    // stuff
}


int main(){

    const int N_ARRAYS = 20;
    void *h_array = malloc(sizeof(void*) * N);
    for(int i = 0; i < N_ARRAYS; i++){
        cudaMalloc(&d_array[i], i * sizeof(void*));
        //TODO: check error
    }
    Void *d_array = cudaMalloc(sizeof(void*) * N);

    // Copy to device Memory
    cudaMemcpy(d_array, h_array, sizeof(void*) * N, cudaHostToDevice);

    multi_array_kernel<1,1>(N_ARRAYS, d_array);

    threadSynchronize();

    for(int i = 0; i < N_ARRAYS; i++){
        cudaFree(h_array[i]); //host not device memory
        //TODO: check error
    }
    cudaFree(d_array);
    free(h_array);
}
fabrizioM
+1  A: 

I think in the first loop it should be &h_array[i] not &d_array[i].

Jamshidi