ansaurus

Question

Answer 1

+1 A:

This should do the trick:

dom = parseString('<something><data>I WANT THIS</data></something>')
data = dom.getElementsByTagName('data')[0].childNodes[0].data

i.e. you need to wade deeper into the DOM structure to get at the text child node and then access its value.

Andy 2009-09-16 16:09:13

Note that in the case of an empty string there will be no child Text Node so childNodes[0] will fail.

bobince 2009-09-16 16:20:55

To collect text data properly one have to traverse through childNodes and concatenate data from all node where node.nodeType is either TEXT_NODE or CDATA_SECTION_NODE. ElementTree interface is simplier.

Denis Otkidach 2009-09-16 16:56:22

Answer 2

+2 A:

So the way to look at it is that "I WANT THIS" is actually another node. It's a text child of "data".

from xml.dom.minidom import parseString
dom = parseString(data)
nodes = dom.getElementsByTagName('data')

At this point, "nodes" is a NodeList and in your example, it has one item in it which is the "data" element. Correspondingly the "data" element also only has one child which is a text node "I WANT THIS".

So you could just do something like this:

print nodes[0].firstChild.nodeValue

Note that in the case where you have more than one tag called "data" in your input, you should use some sort of iteration technique on "nodes" rather than index it directly.

Brent Nash 2009-09-16 16:10:41

ansaurus

tags:

views:

answers:

Getting text values from XML in Python

related questions