views:

518

answers:

3

Ryan Tomayko touched off quite a fire storm with this post about using Unix process control commands.

We should be doing more of this. A lot more of this. I'm talking about fork(2), execve(2), pipe(2), socketpair(2), select(2), kill(2), sigaction(2), and so on and so forth. These are our friends. They want so badly just to help us.

I have a bit of code (a delayed_job clone for DataMapper that I think would fit right in with this, but I'm not clear on how to take advantage of the listed commands. Any Ideas on how to improve this code?

def start
  say "*** Starting job worker #{@name}"
  t = Thread.new do
    loop do
      delay = Update.work_off(self)
      break if $exit
      sleep delay
      break if $exit
    end
    clear_locks
  end

  trap('TERM') { terminate_with t }
  trap('INT')  { terminate_with t }

  trap('USR1') do
    say "Wakeup Signal Caught"
    t.run
  end
end
+1  A: 

Ahh yes... the dangers of "We should do more of this" without explaining what each of those do and in what circumstances you'd use them. For something like delayed_job you may even be using fork without knowing that you're using fork. That said, it really doesn't matter. Ryan was talking about using fork for preforking servers. delayed_job would use fork for turning a process into a daemon. Same system call, different purposes. Running delayed_job in the foreground (without fork) vs in the background (with fork) will result in a negligible performance difference.

However, if you write a server that accepts concurrent connections, now Ryan's advice is right on the money.

  • fork: creates a copy of the original process
  • execve: stops executing the current file and begins executing a new file in the same process (very useful in rake tasks)
  • pipe: creates a pipe (two file descriptors, one for read, one for write)
  • socketpair: like a pipe, but for sockets
  • select: let's you wait for one or more of multiple file descriptors to be ready with a timeout
  • kill: used to send a signal to a process
  • sigaction: lets you change what happens when a process receives a signal
Bob Aman
So instead of creating a (green) Thread, I could use fork(2) instead and git a PID back. This much I have done. How do I work with the block for trap('USR1') which wakes up the thread when I new job hits the queue? How would I create more then one process and get them to pull off the queue optimally. The magic seems to be in pipe and select but I don't understand the intricacies. I can use the daemon gem to create a daemon process which uses fork under the hood. I want to start some child processes to run the queue.
John F. Miller
Depends on what "work" means in this context. If "work" is I/O, it might be different and yeah, `select` is important. I prefer to design work queues in such a way that work can be executed in any order. So if work unit #1 gets executed concurrently with work unit #2, that's not an issue. As for the daemons gem... I used to use it, but now I either use `fork` directly or I use my ChainGang library. I found that daemons hid too much important stuff, and it was unnecessary overhead for something that's actually not that hard. Note that ChainGang is very alpha quality.
Bob Aman
Essentially, worrying about "pulling off the queue optimally" doesn't make a whole lot of sense in most contexts. Either you're working with I/O, in which case you're probably not using a `delayed_job` queue, or you are executing things as discrete work units, in which case `select` and `pipe` don't make any sense. Either way, the right way to deal with a work queue is to just grab whatever's on top whenever you're free. And if there's nothing there, it's probably OK to just block until there is.
Bob Aman
As for `USR1`, you could have whatever pushes to the queue send `USR1` to all of the worker processes, but I'm not at all sure this is necessary. May not even be desirable. If you were pushing a bazillion jobs onto the queue on a regular basis, there wouldn't be much point. This would only be the sort of thing you'd do if your worker process is *always* blocking, waiting for work, and you might go several seconds at a time or more between work performed.
Bob Aman
A: 

I'm working on the same thing. I want to have the parent process be in charge of finding the scheduled jobs that need run, and then hand off the job to a child process to actually do the work.

bkeepers
A: 

5 months later, you can view my solution at http://github.com/antarestrader/Updater. Look at lib/updater/fork_worker.rb

John F. Miller