This code gives the output you want:
from xml.dom.minidom import parseString
document = """\
<assetsMain>
<assetParent type='character' shortName='char'>
<asset>
pub
</asset>
<asset>
car
</asset>
</assetParent>
<assetParent type='par' shortName='pr'>
<asset>
camera
</asset>
<asset>
rig
</asset>
</assetParent>
</assetsMain>
"""
def getNestedList():
dom = parseString(document)
li = []
for assetParent in dom.childNodes[0].getElementsByTagName("assetParent"):
# read type and shortName
a = [assetParent.getAttribute("type"), assetParent.getAttribute("shortName")]
# read content of asset nodes
b = [asset.childNodes[0].data.strip() for asset in assetParent.getElementsByTagName("asset")]
# put the lists together in a list and add them to the list (!)
li.append([a,b])
return li
if __name__=="__main__":
print getNestedList()
Note that we can select which child nodes we want to read with getElementsByTagName
. The attributes are read with getAttribute
on a node. Text content inside a node is read through the property data
(the text itself is a child node as well). If you are reading text inside a node, you can check so that it really is text with:
if node.nodeType == node.TEXT_NODE:
Also note that there is no checking or error handling here. Nodes lacking child nodes will raise an IndexError
.
Although, a nested list of three levels make me want to suggest you use dictionaries instead.
Output:
[[[u'character', u'char'], [u'pub', u'car']], [[u'par', u'pr'], [u'camera', u'rig']]]