views:

619

answers:

4

I frequently have some code that should be run either on a schedule or as a background process with some parameters. The common element is that they are run outside the dispatch process, but need access to the Rails environment (and possibly the parameters passed in).

What's a good way to organize this and why? If you like to use a particular plugin or gem, explain why you find it convenient--don't just list a plugin you use.

A: 

For me, not wanting to maintain a lot of extra infrastructure is a key priority, so I have used database-backed queues that are run outside of Rails.

In my case, I've used background_job and delayed_job. With background_job, the worker was kept running via cron, so there was no daemon management. With delayed_job, I'm using Heroku and letting them worry about that.

With delayed_job you can pass in as many arguments as your background worker needs to run.

Delayed::Job.enqueue(MyJob.new(param[:one], param[:two], param[:three])

I have not found a good solution to running stuff on a schedule, aside from using script/runner via cron (I prefer to use script/runner over a Rake task because I find it easier to test the code).

I've never had to have a regularly scheduled background process that needed access to a particular Rails request so that hasn't been too much of a problem.

I know there are other, cooler systems with more features but this has worked OK for me, and helps me avoid dealing with setting up a lot of new services to manage.

Luke Francl
I'd like to add that while Yehuda has accepted this answer I don't consider what works best for me to be the best for other people. My priorities as a terrible sysadmin are to reduce sysadmin tasks :) If you have more skills there or a need for a higher performance solution, by all means try out one of the more esoteric queuing systems.
Luke Francl
+1  A: 

I have a system that receives requests and then needs to call several external systems using web-services. Some of these requests take longer than a user can be expected to wait and I use an enterprise queuing system(activemq) to handle these requests.

I am using the ActiveMessaging plugin to do this. This allows me to marshall the request and place it on a queue for asynchronous processing with access to the request data, however you will need to write a polling service if you want to wait for the response.

I have seen Ryan Bates railscast on Starling and Workling and they look promising but I haven't used them.

Steve Weet
A: 

For regularly scheduled tasks, I just use rake tasks. It's simple, easily tested, easily understood and integrates well with the Rails environment. Then just execute these rake tasks with a cron job at whatever interval you require (I use whenever to manage these jobs because I'm slightly cron-illiterate).

hopeless
+3  A: 

I really don't like gems like delayed_job and background_job that persist to a database for the purpose of running asynchronous jobs. It just seems dirty to me. Transient stuff doesn't belong in a database.

I'm a huge fan of message queues for dealing with asynchronous tasks, even when you don't have the need for massive scalability. The way I see it, message queues are the ideal "lingua franca" for complex systems. With a message queue, in most cases, you have no restriction on the technologies or languages that are involved in whatever it is that you're building. The benefits of low-concurrency message queue usage is probably most noticeable in an "enterprisey" environment where integration is always a massive pain. Additionally, message queues are ideal when your asynchronous workflow involves multiple steps. RabbitMQ is my personal favorite.

For example, consider the scenario where you're building a search engine. People can submit URIs to be indexed. Obviously, you don't want to retrieve and index the page in-request. So you build around a message queue: The form submission target takes the URI, throws it in the message queue to be indexed. The next available spider process pops the URI off the queue, retrieves the page, finds all links, pushes each of them back onto the queue if they are unknown, and caches the content. Finally, a new message is pushed onto a second queue for the indexer process to deal with the cached content. Indexer process pops that message off the queue, and indexes the cached content. Oversimplified of course — search engines are a lot of work, but you get the idea.

As for the actual daemons, obviously, I'm partial to my own library (ChainGang), but it's really just a wrapper around Kernel.fork() that gives you a convenient place to deal with setup and teardown code. It's also not quite done yet. The daemon piece is far less important than the message queue, really.

Regarding the Rails environment, well, that's probably best left as an exercise for the reader, since memory usage is going to be a significant factor what with the long-running process. You don't want to load anything you don't have to. Incidentally, this is one area that DataMapper kicks ActiveRecord's butt soundly. Environment initialization is well-documented, and there's a lot fewer dependencies that come into play, making the whole kit and caboodle significantly more realistic.

The one thing I don't like about cron+rake is that rake is virtually guaranteed to print to standard output, and cron tends to be excessively chatty if your cron jobs produce output. I like to put all my cron tasks in an appropriately named directory, then make a rake task that wraps them, so that it's trivial to run them manually. It's a shame that rake does this, because I'd really prefer to have the option to take advantage of dependencies. In any case, you just point cron directly at the scripts rather than running them via cron.

I'm currently in the middle of building a web app that relies heavily on asynchronous processes, and I have to say, I'm very, very glad I decided not to use Rails.

Bob Aman
Out of curiosity, what did you decide to use?
Yehuda Katz
Sinatra, DataMapper, Xapian, RabbitMQ
Bob Aman
It sounds like you've either spent significantly more time on architecture than most would be willing to, or have a significantly more complex than usual application. Thoughts?
Yehuda Katz
I probably wouldn't say that actually. The application in question is a largely feature-free search engine for recipes that an iPhone application can call out to. It has well under 10 distinct pages, and will probably mainly be accessed as a web-service. The vast majority of the complexity is in the asynchronous processes that handle spidering, parsing, and indexing. However, even then, it's mostly a bunch of libraries with as little glue code in between as possible. Atypical? Definitely. But I feel like it's one of the more elegant web apps I've ever made. I suspect that's relevant.
Bob Aman
An additional advantage of the message queue over stuff like delayed_job and background_job is simply that there are circumstances where it is absolutely necessary. Virtually anything you can do with those two libraries can also be done with a proper message queue. Yes, it can be argued that it's a sledgehammer cracking a walnut, but there is also something to be said for being extremely well-acquainted with, at least IMHO, an essential piece of technology.
Bob Aman