Is it safe to thread after forking?

I've learned that you should usually stick with either forking or threading to avoid running into very strange and extremely hard-to-debug problems, so until now I always did exactly that. My problem with the matter is that when I stick with only forking, creating many short-lived processes to distribute chunks of work to gets the more expensive with the more CPU cores I want to feed, up until at some point performance just doesn't scale reasonably anymore. At the same time, using only threads I have to be ever so careful about which libraries I use and generally be extremely defensive with regards to thread-safety, this taking up lots of precious development time and enforcing the waiving of some favourite libraries. So, even though I'm warned, the thought of mixing forking and threading does appeal to me on a number of levels.

Now, from what I read so far, problems always seem to arise when there are already threads created when the fork happens.

Given I designed a system that would start up, daemonize, fork off its main tiers and never did any forking again ever after I'd be perfectly safe and robust. If some of those pre-forked tiers would now start to use threads to distribute their workload over many CPU cores, so that the various child processes never know of the other child's thrads, would that be safe still? I can assure that each tier in itself is thread-safe and that the non-thread-safe tiers won't ever start a thread of their own.

While I feel quite safe about this approach, I'd appreciate a few professional opinions on the matter, pointing out all sorts of possible caveats, interesting points of view, links to advanced reading etc. The language I personally use is Perl on Debian, RedHat, SuSe, and OS X, but the topic should be general enough to be valid for any language on any Un*x/BSD-like platform that would behave remotely POSIXish, maybe even Interix.

Hmmm, like pre-forked workers in contrast to a thread pool? Some time in the past I pushed that approach aside because I couldn't think of a way to make it as robust as I'd need it (I'm dealing with pretty sensitive data and everything must stay in an absolutely valid state everytime, even during a sudden power outage). Maybe I was still lacking experience then, maybe I don't sufficiently consider the perils of it today -- right now this actually seems like a good idea. ;-) I'll put some thought into it and come back to you.

Olfan 2010-09-16 10:40:36

@Olfan: Use a robust, reliable message queue and you'll have no problems. This is a common, standard solution. You may want to buy a good message queue and not use an open source solution.

S.Lott 2010-09-16 12:25:51

I can imagine where you're going. Have an outgoing message queue and put jobs to be done in there, have worker children take jobs from the queue and work on them. Have an incoming message qeue and receive notifications about finished or failed jobs from the worker children. Actual commits will only be done after successful notification has been read. If worker children crash or even the whole product goes down, we can find jobs in a "queued" state and clean them up before re-queuing them. I'll need to do some more thinking about this...

Olfan 2010-09-16 13:42:50

This is the way large, commercial service-oriented architectures work. Read up on Oracle's JCAPS environment. All done with reliable message queues. Simple. Reliable. Scalable. Mobile. And above all. Simple.

S.Lott 2010-09-16 17:46:48

I'm at quite a dilemma here. While your answer actually better fits my design problem at hand (and in my opinion may be the better choice for other people in a similar situation, too), caf's answer precisely fits the question asked. I'm positive I'll be coding your way, but I'll have to accept caf's answer for being the correct answer to the actual question in order to not confuse those who'll look up the question later.

Olfan 2010-09-17 09:48:30

@Olfan: There's nothing wrong with admitting that the question itself was unsound. If your original approach was a bad idea, then accepting an answer that supports the bad idea is still a bad idea.

S.Lott 2010-09-17 09:54:52

That ist exactly what I was longing to read, thank you. Are you speaking out of personal experience (gained by doing what?), are you confident by some documentation you've read (which?), are you making an educated guess (based on what?), or are you just trying to comfort me? ;-)

Olfan 2010-09-16 13:31:31

Just long experience with POSIX projects - but [the POSIX documentation on `fork`](http://www.opengroup.org/onlinepubs/000095399/functions/fork.html) is quite clear too. It explains why fork-after-thread is problematic, and it is clear that the problems described do not arise in thread-after-fork (unless you do something like place a mutex into a `MAP_SHARED` region).

caf 2010-09-16 22:25:18

Thank you very much. The link to the POSIX docs should be part of the answer itself. I don't have enough reputation yet to edit your post, so I can only encourage you to do this yourself.

Olfan 2010-09-17 09:24:51

@Olfan: Edited.

caf 2010-09-17 09:58:18

ansaurus

tags:

views:

answers:

Is it safe to thread after forking?

related questions