views:

33

answers:

1

Hey guys,

I faced a problem grabbing the content from a couple of blog feeds I have crawled.

I'm uncertain what is the reason, but by parsing one or two blogs with the feedparser returns me this particular error:

results = feedparser.parse(url)

  ent = []

  for entry in results.entries:
     e = {}
     e['title'] = entry.title
     e['content'] = entry.content[0].value

object has no attribute 'content'

or

object has no attribute 'link'

This hasn't been the case for the rest of my other blogs. Does empty entry content results in this?

+1  A: 

There is a mapping between the XML tags used in the feed and the attributes available on the entries in feedparser. View the source of one of the feeds that has been causing the problem and see what tags it uses. You might find it doesn't include content for the entries or that the links are in a field like uid rather than link.

You will then need to write your code to handle the slight variations, either by using try/catch or checking for specific attributes with hasattr.

If you post a link to one of the feeds in question I might be able to offer some more advice.

mikej
@mikej, quick question. Do custom script templates used by various blogspot blogs alters the way the contents are structured. After looking at the code, a couple of blogs' content are within the 'summary' field instead of the 'content' field
goh
I took a quick look at Blogspot and couldn't see a way to edit the template for the feeds. The blog I looked at used `content` in the Atom feed and `description` in the RSS. Have you got the URLs for a couple you looked at that had differences?
mikej