tags:

views:

1568

answers:

2

I'm building a simple web base rss reader in python, but I'm having trouble parsing the xml. I started out by trying some stuff in the python command line.

>>> from xml.dom import minidom
>>> import urllib2 
>>> url ='http://www.digg.com/rss/index.xml'
>>> xmldoc = minidom.parse(urllib2.urlopen(url))
>>> channelnode = xmldoc.getElementsByTagName("channel")
>>> channelnode = xmldoc.getElementsByTagName("channel")
>>> titlenode = channelnode[0].getElementsByTagName("title")
>>> print titlenode[0]
<DOM Element: title at 0xb37440> 
>>> print titlenode[0].nodeValue 
None

I played around with this for a while but the nodeValue of everything seems to be None but if you look at the xml, theres definitely values there. What am I doing wrong?

+10  A: 

For RSS feeds you should try the Universal Feed Parser library. It simplifies the handling of RSS feeds immensly.

import feedparser
d = feedparser.parse('http://www.digg.com/rss/index.xml')
title = d.channel.title
unbeknown
+7  A: 

This is the syntax you are looking for:

>>> print titlenode[0].firstChild.nodeValue
digg.com: Stories / Popular

Note that the node value is a logical descendant of the node itself.

Yuval A