views:

67

answers:

1

Hi

I have an XML (in the form of tree), I require to create sub-tree out of it.

For ex:

<a>
  <b>
    <c>Hello</c>
  <d>
    <e>Hi</e>
</a>

Subtree would be

<root>
<a>
  <b>
    <c>Hello</c>
   </b>
</a>
<a>
  <d>
     <e>Hi</e>
  </d>
</a>
</root>

What is the best XML library in python to do it? Any algorithm that already does this would also be helpful. Note: the XML doc won't be that big, it will easily fit in memory.

A: 

ElementTree is good and simple for both "reading" and "writing".

Your first XML example (I edited your question just to add formatting so it would be readable!) is invalid, I assume missing close-tags for b and d as appear in what you call "the subtree" (which looks nothing like a subtree to me, but does look like it's intended as a rewrite of your first form).

Net of "prettyfication" issues (e.g. adding newlines and indents to make the resulting XML look pretty;-), this code should do what you're asking, if I understand you correctly:

try:
  import xml.etree.cElementTree as et
  import cStringIO as sio
except ImportError:
  import xml.etree.ElementTree as et
  import StringIO as sio

xmlin = sio.StringIO('''<a>
  <b>
    <c>Hello</c>
  </b>
  <d>
    <e>Hi</e>
  </d>
</a>
''')

tin = et.parse(xmlin)
top = tin.getroot()
tou = et.ElementTree(et.Element('root'))
newtop = tou.getroot()
for child in top.getchildren():
  subtree = et.Element(top.tag)
  subtree.append(child)
  newtop.append(subtree)

import sys
tou.write(sys.stdout)

The try/except at the start tries to use the C versions of the modules on "normal" platforms where they're available, fall back to the pure-Python modules otherwise (for App Engine, Jython, IronPython, ...).

Then I build two trees -- tin, the input one, from the XML string you're given; tou, the output one, initially empty except for the root element.

All the rest is a very simple loop on all subelements of tin's root: for each, a suitable subtree is built and appended to the subelements of tou's root -- that's all there is to it.

The last two lines show the resulting tree (not pretty, due to whitespace issues, but perfectly correct in terms of XML structure;-).

Alex Martelli