I've learned that you should usually stick with either forking or threading to avoid running into very strange and extremely hard-to-debug problems, so until now I always did exactly that. My problem with the matter is that when I stick with only forking, creating many short-lived processes to distribute chunks of work to gets the more expensive with the more CPU cores I want to feed, up until at some point performance just doesn't scale reasonably anymore. At the same time, using only threads I have to be ever so careful about which libraries I use and generally be extremely defensive with regards to thread-safety, this taking up lots of precious development time and enforcing the waiving of some favourite libraries. So, even though I'm warned, the thought of mixing forking and threading does appeal to me on a number of levels.
Now, from what I read so far, problems always seem to arise when there are already threads created when the fork happens.
Given I designed a system that would start up, daemonize, fork off its main tiers and never did any forking again ever after I'd be perfectly safe and robust. If some of those pre-forked tiers would now start to use threads to distribute their workload over many CPU cores, so that the various child processes never know of the other child's thrads, would that be safe still? I can assure that each tier in itself is thread-safe and that the non-thread-safe tiers won't ever start a thread of their own.
While I feel quite safe about this approach, I'd appreciate a few professional opinions on the matter, pointing out all sorts of possible caveats, interesting points of view, links to advanced reading etc. The language I personally use is Perl on Debian, RedHat, SuSe, and OS X, but the topic should be general enough to be valid for any language on any Un*x/BSD-like platform that would behave remotely POSIXish, maybe even Interix.