views:

72

answers:

1

I've written a Linux device driver for a PCI device. This device performs DMA operations. An issue arise when the program crashes when a DMA operation is running.

Indeed, when crashing, the device_remove() function is called by the system (as if close() were called). This function does the cleanup on the memory regions used by the PCI device, frees the allocated memory correctly. I mean it works correctly under normal circumstances.

But if a DMA is running, when it will actually terminate, it won't be able to perform the DMA cleanup because it does not have access anymore to the device data that have been freed. A simple solution would be to wait in the close() function. (This is my understanding, but maybe the last part of the DMA function is never executed?)

Is it a good idea to wail for the DMA to actually terminate in the device_remove() (aka close()) function of a device driver? Are there other means to deal with this issue?

+2  A: 

Yes, wait should work but :

Unless you are trying to test surprise removal behavior of your PCI device, I think a call to remove() should fail when you have DMA going on to/from the device. Also, I don't think close() can be treated the same way as remove(). The latter is going to completely remove all device related data structures from memory (For example: see one of the network device drivers). So, in other words, what I am trying to say is : wait() on close() but fail() on remove()

Also depending on your situation, you might also want to take a look at reference counting for freeing up of the device related resources.

Bandan
Oh, yes. I am completely wrong. `remove()` is not called when the application using the device terminates. It is `release()` that is called! I was searching in the bad direction.
Didier Trosset
Then, I think what you are doing is fair enough. I just checked in the igb driver: if the link is active, a bit is set. When close() is called, the driver does a test_and_set_bit() followed by a schedule_timeout() if the test_and_set_bit fails.
Bandan

related questions