+3  A: 

The problem is that finalizers are executed on the GC thread, CUDA resource allocated in one thread can't be used in another one. A snip from CUDA programming guide:

Several host threads can execute device code on the same device, but by design, a host thread can execute device code on only one device. As a consequence, multiple host threads are required to execute device code on multiple devices. Also, any CUDA resources created through the runtime in one host thread cannot be used by the runtime from another host thread.

Your best bet is to use the using statement, which ensures that the Dispose() method gets always called at the end of the 'protected' code block:

using(CudaEntity ent = new CudaEntity())
{

}
arul
Thanks,In most cases I do not use the CudaEntity in one block, so that solution won't help in most cases. I'll just have to inspect all the code to make sure the dispose() is always called when overwriting a CudaEntity or when an object containing CudaEntities is disposed.
Danny Varod
+1. I've found that when I need to use a CUDA wrapper object from multiple threads, the best bet is to keep a private Thread member in the wrapper class, and run all of the DllImport calls on that thread, so as to hide the thread-affinity details from client code.
Gabriel
Is it possible to dispatch during dispose? - I'm pretty sure the GC freezes all other threads unless using the server GC model.
Danny Varod
How about adding a static dispose queue, then during finalize adding the pointer to the queue and on the next allocation or dispose, dispose everything on the queue?
Danny Varod
Yes, that may work. But you'll need to route all CUDA calls through the allocating/freeing thread. It's kind of hackish approach, and way too defensive. What is the problem with releasing the resource once it's not needed?
arul
I intend to do that, but I'd prefer to have a GC safety net, since I use many CUDA resources in many places and transfer them from place to place.
Danny Varod
Another issue is unit testing - during unit testing a assert can cause a dispose to be skipped and each test is run from a different thread. I managed to implement a garbage stack that works even when unit testing.
Danny Varod