views:

202

answers:

2

I am allocating some float arrays (pretty large, ie 9,000,000 elements) on the GPU using cudaMalloc((void**)&(storage->data), size * sizeof(float)). In the end of my program, I free this memory using cudaFree(storage->data);.

The problem is that the first deallocation is really slow, around 10 seconds, whereas the others are nearly instantaneous.

My question is the following : what could cause this difference ? Is deallocation memory on a GPU usually that slow ?

+1  A: 

should not be that slow, on Linux with cuda 2.2 it takes fraction of a second. Have you tried to run host and device profilers to see exactly why a slow? how many separate allocation do you perfor?, that does have some penalty but not so large.

aaa
+2  A: 

As pointed out on the NVIDIA forums, it's almost certainly a problem with the way you are timing things rather than with cudaFree.

Eric
Yes, that was the problem. I asked on both SO and nVidia forums to make sure that someone competent will answer, and I got want I want on both ;) ! Awesome guys ! Thanks !
Wookai