views:

46

answers:

2

I was trying out the bit.ly api for shorterning and got it to work. It returns to my script an xml document. I wanted to extract out the tag but cant seem to parse it properly.

askfor = urllib2.Request(full_url)
response = urllib2.urlopen(askfor)
the_page = response.read()

So the_page contains the xml document. I tried:

from xml.dom.minidom import parse
doc = parse(the_page)

this causes an error. what am I doing wrong?

+2  A: 

You don't provide an error message so I can't be sure this is the only error. But, xml.minidom.parse does not take a string. From the docstring for parse:

Parse a file into a DOM by filename or file object.

You should try:

response = urllib2.urlopen(askfor)
doc = parse(response)

since response will behave like a file object. Or you could use the parseString method in minidom instead (and then pass the_page as the argument).

EDIT: to extract the URL, you'll need to do:

url_nodes = doc.getElementsByTagName('url')
url = url_nodes[0]
print url.childNodes[0].data

The result of getElementsByTagName is a list of all nodes matching (just one in this case). url is an Element as you noticed, which contains a child Text node, which contains the data you need.

ars
That does parse the_page but i cant seem to get an individual tags. using doc.getElementsByTagName("url") returns: [<DOM Element: url at 0x13cbf80>] instead of the data in between.
Ali
Updated my answer, see above.
ars
+1  A: 
from xml.dom.minidom import parseString
doc = parseString(the_page)

See the documentation for xml.dom.minidom.

Jed Smith
That does parse the_page but i cant seem to get an individual tags.using doc..getElementsByTagName("url") returns: [<DOM Element: url at 0x13cbf80>] instead of the data.
Ali
Continue reading the documentation. That object you are getting back has attributes from which you get both (a) get its children and (b) get the data.
Jed Smith