tags:

views:

22

answers:

2

I have getElementText as follows which works pretty well with [0] as the XML that I'm working on doesn't have the duplicate tag.

from xml.dom import minidom

def getElementText(element, tagName):
    return str(element.getElementsByTagName(tagName)[0].firstChild.data)

doc = minidom.parse("/Users/smcho/Desktop/hello.xml")
outputTree = doc.getElementsByTagName("Output")[0]

print getElementText(outputTree, "Number") 

However, when I parse the following XML, I can't get the value <Number>0</Number> but <ConnectedTerminal><Number>1</Number></ConnectedTerminal> with getElementText(outputTree, "Number"), because the getElementText function returns the first of the two elements with the tag "Number".

<Output>
  <ConnectedTerminal>
    <Node>5</Node>
    <Number>1</Number>
  </ConnectedTerminal>
  <Type>int8</Type>
  <Number>0</Number>
</Output>

Any solution to this problem? Is there any way to get only <Number>0</Number> or <ConnectedTerminal><Number>1</Number></ConnectedTerminal>.

A: 

There's not a direct DOM method to do this, no. But it's fairly easy to write one:

def getChildElementsByTagName(element, tag):
    children= []
    for child in element.childNodes:
        if child.nodeType==child.ELEMENT_NODE and tag in (child.tagName, '*'):
            children.push(child)
    return children

Plus here's a safer text-getting function, so you don't have to worry about multiple nodes, missing nodes due to blank strings, or CDATA sections.

def getTextContent(element):
    texts= []
    for child in element.childNodes:
        if child.nodeType==child.ELEMENT_NODE:
            texts.append(getTextContent(child))
        elif child.nodeType==child.TEXT_NODE:
            texts.append(child.data)
    return u''.join(texts)

then just:

>>> getTextContent(getChildElementsByTagName(doc, u'Number')[0])
u'0'
>>> getTextContent(getChildElementsByTagName(doc, u'Output')[0].getElementsByTagName(u'Number')[0])
u'1'
bobince
+2  A: 

If lxml is an option (it's much nicer than minidomyou) can do:

from lxml import etree

doc = etree.fromstring(xml)

node = doc.find('Number')
print node.text # 0

node = doc.xpath('//ConnectedTerminal/Number')[0]
print node.text # 1

Also see the xpath tutorial.

ars