views:

149

answers:

4

I am designing a Ruby on Rails application that requests XML feeds, reads them in, and parses them into objects to be used in views. Since the request for the XML feed and subsequent receipt of it can take several seconds from some sources to complete I need a way to offload these tasks from my front-line application tier. I do not want my application servers to take more than a few hundred milliseconds to process a request. Currently the application serving processes sit and wait for the XML feed data to be returned so they can parse it and finish return the user's request. I am aware of DelayedJobs, however given that the result of this action is to be returned to the user in real-time I am unsure of how to offload it to a background task and receive the result.

If I offload this task to a background task how does the result get returned to the user loading the page?

+1  A: 

One common model for this sort of thing is to use your preferred background job library (you mention DelayedJob, which seems to be a popular one) to offload the task from the request/response cycle, and then set up AJAX polling on the client to update the page with the results once they become available.

Greg Campbell
This requires storing the results of the xml feed in a table. I'd rather not do this as it will cause a significant number of reads and writes.
Jared Brown
A: 

If you are open to Perl code running on your server, I'd lift a piece of LiveJournal infrastructure: Gearman and TheSchwartz - http://danga.com/gearman/ and http://search.cpan.org/~bradfitz/TheSchwartz-1.07/lib/TheSchwartz.pm

Sounds like you want Gearman - and it has Ruby client bindings.

(see http://www.livejournal.com/doc/server/lj.install.workers_setup_install.html )

Valters Vingolds
This appears similar to DelayedJob. Though my question is not about how to delay the task, but rather, how to get the results to the user if I do delay the task.
Jared Brown
http://code.sixapart.com/svn/gearman/trunk/api/ruby/examples/
Valters Vingolds
A: 

You can have your main returned page fire an AJAX request at a second tier of servers that handle the XML retrieval, and return HTML for the section of the page that will contain that information. That way you aren't running any asynchronous jobs (from the server's point of view) and the retrieval won't start until the AJAX request comes in, which will reduce the bandwidth you waste on bots.

This is a standard use of AJAX, so I'm not sure whether I'm missing something in your problem that makes it inappropriate for you.

Michael Sofaer
A: 

The most common approach is to use AJAX and DelayedJob here, but it is only an usability improvement - instead of user waiting for 5sec to load the page they get an empty or half-empty page with a spinner for 5 seconds. The only way (in my opinion) to really improve the user experience is to load and process those xml feeds periodically and display to user the cached result.

psyho