views:

2547

answers:

3

The latest Google App Engine release supports a new Task Queue API in Python. I was comparing the capabilities of this API vs the already existing Cron service. For background jobs that are not user-initiated, such as grabbing an RSS feed and parsing it on a daily interval. Can and should the Task Queue API be used for non-user initiated requests such as this?

+3  A: 

I'd say "sort of". The things to remember about task queues are:

1) a limit of operations per minute/hour/day is not the same as repeating something at regular intervals. Even with the token bucket size set to 1, I don't think you're guaranteed that those repetitions will be evenly spaced. It depends how serious they are when they say the queue is implemented as a token bucket, and whether that statement is supposed to be a guaranteed part of the interface. This being labs, nothing is guaranteed yet.

2) if a task fails then it's requeued. If a cron job fails, then it's logged and not retried until it's due again. So a cron job doesn't behave the same way either as a task which adds a copy of itself and then refreshes your feed, or as a task which refreshes your feed and then adds a copy of itself.

It may well be possible to mock up cron jobs using tasks, but I doubt it's worth it. If you're trying to work around a cron job which takes more than 30 seconds to run (or hits any other request limit), then you can split the work up into pieces, and have a cron job which adds all the pieces to a task queue. There was some talk (in the GAE blog?) about asynchronous urlfetch, which might be the ultimate best way of updating RSS feeds.

Steve Jessop
async urlfetch exists today, see http://code.google.com/appengine/docs/python/urlfetch/asynchronousrequests.html -- but I'm not sure how it would be the ultimate best way of updating RSS feeds; maybe you have something else in mind?
Alex Martelli
For some reason I was anticipating something that would call back a URL when the fetched data arrived. Not sure where I got that idea from, though, perhaps my imagination. If you're updating a lot of RSS feeds, though, you need the HTTP requests to somehow be parallel, and task queues alone only allow so many simultaneous instances. Quite possibly the API you point to does the job already.
Steve Jessop
worth adding that you can also use a cron job to fill/manage the task queue, so you can have it both ways.
Ted Pennings
+1  A: 

The way I look at it is that if I am just parsing one RSS feed a Cron job might be good enough. If I have to parse X number of RSS feeds specified at run time by a user or any other system variable then I would choose tasks every time.

I only say this because in the past I have had to excecute many user defined twitter searches at regular intervals and with Cron jobs I ended making a very bad Queuing system to execute the requests that needed to be ran - it didn't scale, it didn't help that and the smallest interval that a cron job can be is only 1 minute (I had more searches to perform than minutes in the day).

The cool thing about tasks is that you can give them an ETA, so you can say I would like this to be executed 47 seconds in the future, or I would like this to be executed at 12:30.

Kinlan
+2  A: 

I didn't understand the differences very well until I watched the Google I/O video where they explain it. The official source is usually the best.

youtube video

slides from the presentation

mcotton