ansaurus

Question

CUDA global memory deallocation issues in .NET

Answer 1

+3 A:

The problem is that finalizers are executed on the GC thread, CUDA resource allocated in one thread can't be used in another one. A snip from CUDA programming guide:

Several host threads can execute device code on the same device, but by design, a host thread can execute device code on only one device. As a consequence, multiple host threads are required to execute device code on multiple devices. Also, any CUDA resources created through the runtime in one host thread cannot be used by the runtime from another host thread.

Your best bet is to use the using statement, which ensures that the Dispose() method gets always called at the end of the 'protected' code block:

using(CudaEntity ent = new CudaEntity())
{

}

arul 2009-09-19 23:52:04

Thanks,In most cases I do not use the CudaEntity in one block, so that solution won't help in most cases. I'll just have to inspect all the code to make sure the dispose() is always called when overwriting a CudaEntity or when an object containing CudaEntities is disposed.

Danny Varod 2009-09-20 00:18:55

+1. I've found that when I need to use a CUDA wrapper object from multiple threads, the best bet is to keep a private Thread member in the wrapper class, and run all of the DllImport calls on that thread, so as to hide the thread-affinity details from client code.

Gabriel 2009-09-20 00:26:49

Is it possible to dispatch during dispose? - I'm pretty sure the GC freezes all other threads unless using the server GC model.

Danny Varod 2009-09-20 10:30:00

How about adding a static dispose queue, then during finalize adding the pointer to the queue and on the next allocation or dispose, dispose everything on the queue?

Danny Varod 2009-09-20 10:36:31

Yes, that may work. But you'll need to route all CUDA calls through the allocating/freeing thread. It's kind of hackish approach, and way too defensive. What is the problem with releasing the resource once it's not needed?

arul 2009-09-20 12:15:01

I intend to do that, but I'd prefer to have a GC safety net, since I use many CUDA resources in many places and transfer them from place to place.

Danny Varod 2009-09-20 18:59:57

Another issue is unit testing - during unit testing a assert can cause a dispose to be skipped and each test is run from a different thread. I managed to implement a garbage stack that works even when unit testing.

Danny Varod 2009-10-09 00:49:45

ansaurus

tags:

views:

answers:

CUDA global memory deallocation issues in .NET

related questions