views:

47

answers:

4

I've been working on a small sand-boxed example to help me figure out how to use rwlocks. Everything seems fairly straightforward, however I'm getting deadlocks in my example every once and a while and don't understand why it's happening.

I've put the code example on pastebin because it's more than a few lines of code: http://pastebin.org/359203

If you run the example. When it eventually deadlocks the last three print statements will be one of two cases:

one:

th4: request lock
th3: request lock
th4: locked

two:

th3: request lock
th4: request lock
th3: locked

Based on the output. To me it seems like there is an eventual deadlock from a second call to a locking function, whether it's to a read lock, or a write lock. But since one of the threads has the lock, and the same thread is what calls the second locking function, why is it deadlocking? Even more interesting, what is it in this small case that is causing the deadlock?

Note that I'm on Mac OS X, and that this is a seriously contrived example. It's sand-boxed from something else I'm working on and wanted to make sure I get this part right.

+1  A: 

Your problem is that pthread_rwlock_wrlock(3) is not reentrant. The documentation clearly states that the results of calling this method when the thread already holds the lock are undefined. Your code specifically calls the method twice without releasing the lock in between.

Jason Coco
Yeah I see that. I simplified the code just for the example. on Mac os x pthread_rwlock_wrlock returns EDEADLK if the caller already owns the lock. And the unlock will only work if the caller owns the lock. So for this example it doesn't matter. Also note that almost always it's the call to the second rlock that causes the deadlock. not the wlock.
Aaron Smith
@Aaron Smith: No, Mac OS X uses fast locks by default and they don't do this error checking. They will do exactly what the documentation says they will, which is behave in an undefined way (which is what you're seeing probably when you over-unlock the write lock). If you want the call to fail with error checking, you have to set the appropriate attributes when you create the lock.
Jason Coco
hm. but if I store the result of two successive calls to pthread_rwlock_wrlock. the first is 0, the second is EDEADLK. With pthread_wrlock_unlock the first is 0, second is EPERM.
Aaron Smith
@Aaron Smith: It doesn't matter, once you call that second wrlock all bets are off. Even if you get those error codes and are using error-checked locks, all bets are /still/ off. It's basically a programming error to call it twice from the same thread.
Jason Coco
ok. that makes sense. it's probably better to deal with the undefined behavior so it's portable anyways.
Aaron Smith
+1  A: 

pthread_rwlock supports recursive read locking, but not recursive write locking. If you write lock the lock while you already hold it, you have entered the realm of undefined behavior. This is the case for your thfn3().

It's clearer if you call the threads the "reader" (thfn4) and the "writer" (thfn3). Case one is then:

  • reader tries to lock
  • writer tries to lock and blocks waiting for reader to release lock
  • reader gets lock
  • reader tries to lock again and blocks waiting for writer to acquire lock and release lock

In this case, the reader is likely unable to lock again because there is a writer waiting on the lock, and would-be writers block would-be readers.

Case two is:

  • writer tries to lock
  • reader tries to lock and blocks waiting for writer to finish
  • writer gets lock
  • writer tries to lock again and blocks

This case can likely only be explained by appeal to details of the rwlock implementation.

Jeremy W. Sherman
yeah I've already done a bunch of tests that keep track of all this. What i'm experiencing is that it doesn't matter. A call to pthread_rwlock_wrlock will return EDEADLK when the calling thread already owns the lock. And pthread_rwlock_unlock will return EPERM if the calling thread doesn't own the lock. So for the sake of this example it doesn't matter. The implementation will only allow one wrlock, and only allow the owning thread to correctly unlock the wrlock.
Aaron Smith
that kind of makes sense. What I don't get is in case 1. The output clearly shows that the first read lock was acquired, so why does the second call to read lock get blocked? When the calling thread already has the read lock and should succeed. That's what recursive locks are... it should succeed because the calling thread already has the read lock.
Aaron Smith
A: 

See the bug I reported with apple. This is the problem.

https://bugreport.apple.com/cgi-bin/WebObjects/RadarWeb.woa/7/wo/0blX77DJS8lBTTxVnTsNDM/5.83.28.0.13

Aaron Smith
No-one but you and Apple employees will be able to see the radar report. Please copy the information to [OpenRadar](http://openradar.appspot.com/).
Jeremy W. Sherman
thanks for the info. didn't know that.
Aaron Smith
A: 

Here's the open radar bug.

http://openradar.appspot.com/8588290

Aaron Smith
Dude, I told you that the behavior is undefined. They say that very clearly in their documentation and the POSIX spec says it as well. The fact that Linux works the way you want it to work is nice, but if you want something to be portable, you have to go by the POSIX docs and you have to respect the docs for your target platform.
Jason Coco