tags:

views:

647

answers:

2

I am trying to build a GAE app that processes an RSS feed and stores all the data from the feed into Google Datastore. I use Minidom to extract content from the RSS feed. I also tried using Feedparser and BeautifulSoup but they did not work for me.

My app currently parses the feed and saves it in the Google datastore in about 25 seconds on my local machine. I uploaded the app and I when I tried to use it, I got the "DeadLine Exceeded Error".

I would like to know if there are any possible ways to speed up this process? The feed I use will eventually grow to have more than a 100 items over time.

+2  A: 

It shouldn't take anywhere near that long. Here is how you might use the Universal Feed Parser.

# easy_install feedparser

And an example of using it:

import feedparser

feed = 'http://stackoverflow.com/feeds/tag?tagnames=python&sort=newest'
d = feedparser.parse(feed)
for entry in d['entries']:
    print entry.title

The documentation shows you how to pull other things out of a feed. If there is a specific issue you have, please post the details.

DisplacedAussie
Thanks for your response DisplacedAussie. The one problem I had with Feedparser was that I could not get the attributes of the tags. Could you tell me how to do that?
A_iyer
What attributes can you not access? http://www.feedparser.org/docs/index.html
DisplacedAussie
A: 

I found a way to work around this issue, though I am not sure if this is the optimal solution.

Instead of Minidom I have used cElementTree to parse the RSS feed. I process each "item" tag and its children in a seperate task and add these tasks to the task queue.

This has helped me avoid the DeadlineExceededError. I get the "This resource uses a lot of CPU resources" warning though.

Any idea on how to avoid the warning?

A_iyer

A_iyer