tags:

views:

70

answers:

3

I have created a xml file using xml.etree.ElementTree in python. I then use

tree.write(filename, "UTF-8") 

to write out the document to a file.

But when I open filename using a text editor (vi on linux), there are no newlines between the tags. Everything is one big line

How can I write out the document in a "pretty printed" format so that there are new lines (and hopefully indentations etc) between all the xml tags?

Thanks!

A: 

According to this thread your best bet would be installing pyXml and use that to prettyprint the ElementTree xml content (as ElementTree doesn't seem to have a prettyprinter by default in Python):

import xml.etree.ElementTree as ET

from xml.dom.ext.reader import Sax2
from xml.dom.ext import PrettyPrint
from StringIO import StringIO

def prettyPrintET(etNode):
    reader = Sax2.Reader()
    docNode = reader.fromString(ET.tostring(etNode))
    tmpStream = StringIO()
    PrettyPrint(docNode, stream=tmpStream)
    return tmpStream.getvalue()
ChristopheD
A: 

There is no pretty printing support in ElementTree, but you can utilize other XML modules.

For example, xml.dom.minidom.Node.toprettyxml():

Node.toprettyxml([indent=""[, newl=""[, encoding=""]]])

Return a pretty-printed version of the document. indent specifies the indentation string and defaults to a tabulator; newl specifies the string emitted at the end of each line and defaults to \n.

Use indent and newl to fit your requirements.

An example, using the default formatting characters:

>>> from xml.dom import minidom
>>> from xml.etree import ElementTree
>>> tree1=ElementTree.XML('<tips><tip>1</tip><tip>2</tip></tips>')
>>> ElementTree.tostring(tree1)
'<tips><tip>1</tip><tip>2</tip></tips>'
>>> print minidom.parseString(ElementTree.tostring(tree1)).toprettyxml()
<?xml version="1.0" ?>
<tips>
    <tip>
        1
    </tip>
    <tip>
        2
    </tip>
</tips>

>>> 
gimel
Good answer, but the only question is: why does minidom insert extraenous whitespace (for `1` and `2` ; significant in xml)?
ChristopheD
Good question ;-) Use with care.
gimel
Modify indent and newl.
gimel
A: 

The easiest solution I think is switching to the lxml library. In most circumstances you can just change your import from import xml.etree.ElementTree as etree to from lxml import etree or similar.

You can then use the pretty_print option when serializing:

tree.write(filename, pretty_print=True)

(also available on etree.tostring)

Steven
Thanks Steven. This is what I ended up doing.
MK