views:

148

answers:

5

Hi all,

I'm developing a mechanism for interchanging data between two or more processes using shared memory on linux. The problem is some level of concurrency control is required to maintain data integrity on the shared memory itself, and as I'm specting that sometime or another my process could be killed/crash, common lock mechanisms dont' work because they could left the memory in a "locked" state and right after dying, making other processes hung waiting for the lock to be released.

So, doing some research I've found that System V semaphores have a flag called SEM_UNDO wich can revert the lock state when the program fails, but that's not guaranteed to work. Another option is to monitor the PID's from all processes that might use the shared memory and do some control over them if something nasty happens, but I'm not so sure if this might be the right approach to my problem.

Any ideas?? :)

Edit: for explanation purposes, our app need some kind of IPC mechanism with the smallest latency possible. So, I'm open for mechanisms that can handle this requirement also.

+1  A: 

There are only few things that are guaranteed to be cleaned up whence a program fails. The only thing that comes to my mind here are link counts. An open file descriptor increases the link count of the underlying inode and a corresponding close decreases it, including a forced close when the program fails.

So your processes could all open a common file (don't remember if it works for shared memory segments) and you could trigger some sort of alarm if the count decreases, where it shouldn't. E.g instead of doing a plain wait your processes could do a timedwait (for a second, e.g) in a loop and poll for the link count to be alerted when somethings is going wrong.

Jens Gustedt
if you use futexes correctly the kernel will clean them up
Spudd86
The question is marked with posix. IIRC futexes are pure linux constructs and not portable to other POSIX systems.
Jens Gustedt
After some thinking, there is actually a lock structure that is POSIX compliant and is guaranteed to be cleaned up on process termination: advisory file locks by means of fcntl. They are a bit tricky to use (pitfall: you loose a lock when any fd on the same inode is closed by any of the threads of the process) but are realized entirely in kernel space and nothing is actually written to disk.
Jens Gustedt
+1  A: 

When you stated that semaphores can't cleanly handle processes I was a little surprised. That sort of support seems fairly fundamental! Looking at teh semop man page both on my ubuntu 10.4 system and on the web here seems to suggest that it should be OK. Hopefuly the memory used to store the SEM_UNDO count is stored in kernel space, and hence safe from errant memory writes.

Truth be told though, even a reliable semaphore locking mechanism might not completely solve your problem. If you're using locks to allow for transaction processing you will also need to handle situations where transaction is halted part way through before crashing, and allowing another program to access the data structure.

torak
Sorry, haven't found the source from where I've readed that. Although system IV semaphores have a limitation of about 37k SEM_UNDO structures that can be held by the process, so it wouldn't work anyway as my application can write this amount of messages _very_ fast. Thanks anyway.
scooterman
A: 

I would be curious to know what source you used that said SEM_UNDO was not guaranteed to work. I have not heard that before. I seem to remember reading articles claiming linux's SYSV IPC in general was buggy but that was quite awhile ago. I am wondering if your info is just an artifact of times past.

The other thing to consider (if I remember correctly) is that SYSV semaphores have the capability to tell you the PID of the last process to perform a semaphore operation. If you hang you should be able to query to see if the process holding the lock is still alive. Since any process (not just the one holding the lock) can fiddle with semaphore you might exercise control that way.

Lastly, I'll put in a pitch for message queues. They might not be appropriate for your speed requirements but they are generally not that much slower than shared memory. In essence they are doing everything you have to do manually with SM anyway but the OS does it all beneath the covers. You get almost as much speed with synchronization, atomicity, ease of use, and a throughly tested mechanism for free.

Duck
I'll take this as accepted because I've switched to message queues and found it adaptable to my needs. Thanks
scooterman
A: 

You can use a pthread mutex in shared memory pthread_mutexattr_setpshared ( http://linux.die.net/man/3/pthread_mutexattr_setpshared )

Also you could try to use futexes directly see http://people.redhat.com/drepper/futex.pdf and http://lxr.linux.no/#linux+v2.6.34/Documentation/robust-futexes.txt and http://www.kernel.org/doc/man-pages/online/pages/man7/futex.7.html and http://www.kernel.org/doc/man-pages/online/pages/man2/futex.2.html particularaly the second one since that talks about getting the kernel to release it when a process holding it dies.

Also I think it's possible to make the pthreads locks/CVs robust, which is a better idea since then all the stuff for handling robust locks is done for you (in an even remotely modern distro it should be using the robust futexs described in http://lxr.linux.no/#linux+v2.6.34/Documentation/robust-futexes.txt for pthread_mutex IIRC since that's been in the kernel for quite a while, but you might want to make sure you don't need to do anything to make your pthread_mutex robust)

Spudd86
+2  A: 

So, doing some research I've found that System V semaphores have a flag called SEM_UNDO wich can revert the lock state when the program fails, but that's not guaranteed to work.

SEM_UNDO would unlock the semaphore if process crashes. If processes crashed due to corruption of the shared memory, there is nothing semaphores can do for you. OS can't undo the state of shared memory.

If you need to be able to roll-back state of the shared memory, then you have to implement something on your own. I have seen at least two models which deal with that.

First model before modifying anything in shared memory was taking a snapshot of the structure, saving in a list in the shared memory. If any any other process was able to get the lock and the list wasn't empty, it was undoing whatever the crashed process might have changed.

Second model is to make copies of the shm structures in the local memory and keep the lock locked for the whole transaction. When transaction is being committed, before releasing the lock, simply copy the structures from local memory into the shared memory. Probability that app would crash during copy is lower and intervention by external signals can be blocked by using sigprocmask(). (Locking in the case better be well partitioned over the data. E.g. I have seen tests with set of 1000 locks for 10Mln records in shm accessed by 4 concurrent processes.)

Dummy00001
Very interesting stuff. Thank you!
scooterman