views:

204

answers:

3

I'd like some thoughts on whether using fork{} to 'background' a process from a rails app is such a good idea or not...

From what I gather fork{my_method; Process#setsid} does in fact do what it's supposed to do.

1) creates another processes with a different PID

2) doesn't interrupt the calling process (e.g. it continues w/o waiting for the fork to finish)

3) executes the child until it finishes

..which is cool, but is it a good idea? What exactly is fork doing? Does it create a duplicate instance of my entire rails mongrel/passenger instance in memory? If so that would be very bad. Or, does it somehow do it without consuming a huge swath of memory.

My ultimate goal was to do away with my background daemon/queue system in favor of forking these processes (primarily sending emails) -- but if this won't save memory then it's definitely a step in the wrong direction

A: 

The semantics of fork is to copy the entire memory space of the process into a new process, but many (most?) systems will do that by just making a copy of the virtual memory tables and marking it copy-on-write. That means that (at first, at least) it doesn't use that much more physical memory, just enough to make the new tables and other per-process data structures.

That said, I'm not sure how well Ruby, RoR, etc. interacts with copy-on-write forking. In particular garbage collection could be problematic if it touches many memory pages (causing them to be copied).

wdebeaum
I've heard both things about COW... pretty sure some of the 1.8 branch did not support it, but REE does(?). And I've both heard that 1.9 does and does not support COW.That said, *even if it does*, imagine my rails action:def foodo_stufffork_and_send_emaildo_more_stuffendeven if the fork COW wouldn't the original memory location be instantly changed (because of what comes after it) and thus instigate a copy? Even if the fork was the last method call. I'd imagine rails still does stuff after it, not to mention... the next request coming in on the same process.</talkingOutOfMyAss>
crankharder
dammit, commenting formatting:def foo; do_stuff; fork_and_send_email; do_more_stuff; end
crankharder
Well, yes, some copying would occur, but hopefully it wouldn't be the entire memory space of the process; rather it would be individual memory pages here and there (on x86 memory pages are usually 4 kilobytes).
wdebeaum
Wait a second, COW is something the OS kernel provides to everything, it doesn't depend on application versions, and yes, it's page-by-page, so one write only triggers a copy of one page
DigitalRoss
+3  A: 

The fork does make a copy of your entire process, and, depending on exactly how you are hooked up to the application server, a copy of that as well. As noted in the other discussion this is done with copy-on-write so it's tolerable. Unix is built around fork(2), after all, so it has to manage it fairly fast. Note that any partially buffered I/O, open files, and lots of other stuff are also copied, as well as the state of the program that is spring-loaded to write them out, which would be incorrect.

I have a few thoughts:

  • Are you using Action Mailer? It seems like email would be easily done with AM or by Process.popen of something. (Popen will do a fork, but it is immediately followed by an exec.)
  • immediately get rid of all that state by executing Process.exec of another ruby interpreter plus your functionality. If there is too much state to transfer or you really need to use those duplicated file descriptors, you might do something like IO#popen instead so you can send the subprocess work to do. The system will share the pages containing the text of the Ruby interpreter of the subprocess with the parent automatically.
  • in addition to the above, you might want to consider the use of the daemons gem. While your rails process is already a daemon, using the gem might make it easier to keep one background task running as a batch job server, and make it easy to start, monitor, restart if it bombs, and shut down when you do...
  • if you do exit from a fork(2)ed subprocess, use exit! instead of exit
  • having a message queue and a daemon already set up, like you do, kinda sounds like a good solution to me :-)
DigitalRoss
+1  A: 

Be aware that it will prevent you from using JRuby on Rails as fork() is not implemented (yet).

Redbeard