views:

33

answers:

2

We're doing some prototyping of a new app and noticed that one of the actions were taking forever to load (80-120 seconds). As a lot of the processing doesn't need to happen on page load (we can request the data via Ajax later), I thought of using Process.fork to allow the page to return immediately, while the processing still happening "behind the scenes."

We're using Apache with Passenger for the app.

A couple of things:

  1. I know about delayed_jobs, resque, BJ and other background job gems. We use dj, and eventually will use something like it for this as well. This is a stopgap solution while we're prototyping.

  2. I'm not concerned with server performance. The app runs on its own server, with only a handful of users trying it out.

Early tests suggest this works well, but I'm wondering whether it would be a good idea to use this. Is it going to be reliable? Are the forked process going to continue if the user navigates to another page, or closes tab/browser? After the fork has finished, is the process going to terminate by itself?

A: 

Yes it's reliable as long as you use a gem that's been tested and used before. Both DelayedJobs and Spawn (which I often use) have been around for quite some time and should do exactly what you expect them to.

Since the process is running in the background on your server it should continue just fine if the user closes the tab/browser it has no client-side attachments. When the program has finished executing it will terminate all by itself and free up the memory.

You can read more about forking on this excellent wiki-page. As a side-note don't use the ruby fork method in Rails since that will not play nice with ActiveRecord.

Maran
I'm afraid you didn't really read my question... I'm aware of the DJ, Spawn and other gems, but would like to avoid using them if I can while we're prototyping. What I'm specifically after is whether just using `Process.fork` would cause problems--essentially making your "side-note" in the last paragraph the main point of your answer. Would you mind elaborating on this further?
vonconrad
I might not get it but spawn needs no implementation; the only thing you do is call spawn and give it a block to execute in the backend. This will hardly costs you hours of time. The problem with forking is that in my knowledge the ActiveRecord is not available in a fork. I don't know the technical details but that's why Spawn is created to fork that with the other code.
Maran
+1  A: 

Depends what 'processing' means. Generally this is not going to be reliable if processing means using Rails stack - as master process freed by request may be assigned to another request by passenger and things may get wrong. Also Passenger may shut down master process and thus Rails instance in some conditions (reducing pool of idle instances etc).

Generally this may lead to process leakage, unexpected locks, race conditions, app errors while shutdown etc.

I would suggest using workers running outside Apache/Passenger stack, e.g. using clustered BackgrounDRb or other solution (you mentioned Resque).

There is also another idea, which I currently use for cron jobs with my app. My crontab is just few wget to actions with long running tasks. You can do something similar in ruby fork with OpenURI on demand. Imagine application pinging itself by HTTP. Forked process doesn't need Rails anymore - it just accesses task page and next Passenger serves request and manages application instance for this special request.

In case Passenger kills fork's parent and thus forked process - the another Rails instance should continue to process http request.

gertas
Good answer, thanks. This is the sort of information I was looking for--I was curious about Passenger killing processes and so on. Problem is that while we're prototyping, I don't want to spend a lot of time or effort setting up background processing. While we're eventually going to use DJ, BackgrounDRb, Resque, or similar, they're all overkill in this instance. I just want something that returns the page while simultaneously executes one single model action that populates a couple of database tables with data.
vonconrad
I edited answer with another idea.
gertas