views:

318

answers:

2

Hi,

My scenario: one server and (some clients (though not many). The server can only respond to one client at a time, so they must be queued up. I'm using a mutex (boost::interprocess::interprocess_mutex) to do this, wrapped in a boost::interprocess::scoped_lock.

The thing is, if one client dies unexpectedly (i.e. no destructor runs) while holding the mutex, the other clients are in trouble, because they are waiting on that mutex. I've considered using timed wait, so if I client waits for, say, 20 seconds and doesn't get the mutex, it goes ahead and talks to the server anyway.

Problems with this approach: 1) it does this everytime. If it's in a loop, talking constantly to the server, it needs to wait for the timeout every single time. 2) If there are three clients, and one of them dies while holding the mutex, the other two will just wait 20 seconds and talk to the server at the same time - exactly what I was trying to avoid.

So, how can I say to a client, "hey there, it seems this mutex has been abandoned, take ownership of it"?

+4  A: 

Unfortunately, this isn't supported by the boost::interprocess API as-is. There are a few ways you could implement it however:

If you are on a POSIX platform with support for pthread_mutexattr_setrobust_np, edit boost/interprocess/sync/posix/thread_helpers.hpp and boost/interprocess/sync/posix/interprocess_mutex.hpp to use robust mutexes, and to handle somehow the EOWNERDEAD return from pthread_mutex_lock.

If you are on some other platform, you could edit boost/interprocess/sync/emulation/interprocess_mutex.hpp to use a generation counter, with the locked flag in the lower bit. Then you can create a reclaim protocol that will set a flag in the lock word to indicate a pending reclaim, then do a compare-and-swap after a timeout to check that the same generation is still in the lock word, and if so replace it with a locked next-generation value.

If you're on windows, another good option would be to use native mutex objects; they'll likely be more efficient than busy-waiting anyway.

You may also want to reconsider the use of a shared-memory protocol - why not use a network protocol instead?

bdonlan
Great answer. I don't think I'll be implementing it, though; it doesn't seem worth the trouble - I'll think of something else. About your suggestion of using a network protocol, I couldn't agree with you more. Unfortunately, it's just too late in the game to change things so radically.
Pedro d'Aquino
A: 

Hello sir,

I tried to implement pthread robust mutex with boost interprocess calls.

editing boost/interprocess/sync/posix/thread_helpers.hpp and boost/interprocess/sync/posix/interprocess_mutex.hpp to use robust mutexes, and to handle somehow the EOWNERDEAD return from pthread_mutex_lock.

but nothing major is happening .Its still stucked in lock call, nt returning EOWNERDEAD.

My pthread.h library has got robust and consistent_np part. ie, /* Set the robustness flag of the mutex attribute ATTR. */ extern int pthread_mutexattr_setrobust_np (pthread_mutexattr_t *__attr, int __robustness) __THROW __nonnull ((1));

endif

So am assuming my os is supporting robustmutexes..

Can I see whether muexes are actually robust

basically , I tested all this using gdb . Ran gdb on process till it acquires a mutex lock . it gives (boost::interprocess::interprocess_recursive_mutex &) @0xb776f038: {m_mut = {__data = { __lock = -2147481655, __count = 1, __owner = 1993, __kind = 17, _nusers = 1, {_spins = -1208674528, _list = {_next = 0xb7f51720}}}, __size = "É\a\000\200\001\000\000\000É\a\000\000\021\000\000\000\001\000\000\000 \027õ·", __align = -2147481655}}

ie , process id 1993 is holding it.

Now i started with another GDB , and tried to lock mutexes again . bt it it hanged . Its not returning the EOWNERDEAD value.

Let me know if I can check if robust mutexes are set or not.

I am using Linux version 2.6.18-92.el5 ([email protected]) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)) #1 SMP Tue Jun 10 18:49:47 EDT 2008

thanks, Suman

suman
its urgent..please reply
suman
I have the same problem. On Fedora 13 it does not return EOWNERDEAD for abandoned robust mutexes. It just hangs, and try lock returns EBUSY.
Maxim Yegorushkin