tags:

views:

161

answers:

2

Is there a easy way to accomplish the same thing in Python as xsl accomplishes with:

<xsl:strip-space elements="*"/>

So for instance in the following

for event, elem in ElementTree.iterparse("/tmp/example.xml"):
    if elem.tag == "example":
        print ElementTree.tostring(elem)

when the example nodes are printed out all the spaces and line feeds in the input file between children of the example node will be removed?

+2  A: 

I believe you need to explicitly manipulate the subtree to strip every text and tail:

from xml.etree import ElementTree

for event, elem in ElementTree.iterparse("/tmp/example.xml"):
    if elem.tag == "example":
        subiter = ElementTree.ElementTree(elem).getiterator()
        for x in subiter:
          if x.text: x.text = x.text.strip()
          if x.tail: x.tail = x.tail.strip()
        print ElementTree.tostring(elem)
Alex Martelli
+1  A: 

An elegant solution could have an iterator which simply filters out whitespace-only text nodes:

import re

whitespaces = re.compile('\s*$')
def omit_whitespaces(iter):
    for event, elem in iter:
        if whitespaces.match(elem.text): elem.text = ''
        if whitespaces.match(elem.tail): elem.tail = ''
        yield event, elem

def strip_whitespaces(iter):
    for event, elem in iter:
        elem.text = elem.text.strip()
        elem.tail = elem.tail.strip()
        yield event, elem

And then use it as follows (either strip or omit, depending on whether you want to preserve spaces in text nodes with non-whitespace characters too):

for event, elem in omit_whitespaces(ElementTree.iterparse("/tmp/example.xml")):
    if elem.tag == "example":
        print ElementTree.tostring(elem)

Note that in this case you have to use only 'end' event (otherwise parser can give you partial data).

But... I don't really know ElementTree very well and I didn't tested this code though.

liori