tags:

views:

3276

answers:

6

What is the best way (or even the various ways) to pretty print xml in python?

+1  A: 

XML pretty print for python looks pretty good for this task. (Appropriately named, too.)

An alternative is to use pyXML, which has a PrettyPrint function.

Daniel Lew
+6  A: 

lxml is recent, updated, and includes a pretty print function

import lxml.etree as etree

x = etree.parse("filename")
print etree.tostring(x, pretty_print = True)

Check out the lxml tutorial: http://codespeak.net/lxml/tutorial.html

1729
Only downside to lxml is a dependency on external libraries. This I think is not so bad under Windows the libraries are packaged with the module. Under linux they are an `aptitude install` away. Under OS/X I'm not sure.
intuited
+5  A: 

If you're using a DOM implementation, each has their own form of pretty-printing built-in:

# minidom
#
document.toprettyxml()

# 4DOM
#
xml.dom.ext.PrettyPrint(document, stream)

# pxdom (or other DOM Level 3 LS-compliant imp)
#
serializer.domConfig.setParameter('format-pretty-print', True)
serializer.writeToString(document)

If you're using something else without its own pretty-printer — or those pretty-printers don't quite do it the way you want —  you'd probably have to write or subclass your own serialiser.

bobince
+1  A: 

I had some problems with minidom's pretty print. I'd get a UnicodeError whenever I tried pretty-printing a document with characters outside the given encoding, eg if I had a β in a document and I tried doc.toprettyxml(encoding='latin-1'). Here's my workaround for it:

def toprettyxml(doc, encoding):
    """Return a pretty-printed XML document in a given encoding."""
    unistr = doc.toprettyxml().replace(u'<?xml version="1.0" ?>',
                          u'<?xml version="1.0" encoding="%s"?>' % encoding)
    return unistr.encode(encoding, 'xmlcharrefreplace')
giltay
+5  A: 
import xml.dom.minidom

xml = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = xml.toprettyxml()
Ben Noland
+1  A: 

Here's my (hacky?) solution to get around the ugly text node problem.

uglyXml = doc.toprettyxml(indent='  ')

text_re = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL)    
prettyXml = text_re.sub('>\g<1></', uglyXml)

print prettyXml

The above code will produce:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>1</id>
    <title>Add Visual Studio 2005 and 2008 solution files</title>
    <details>We need Visual Studio 2005/2008 project files for Windows.</details>
  </issue>
</issues>

Instead of this:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>
      1
    </id>
    <title>
      Add Visual Studio 2005 and 2008 solution files
    </title>
    <details>
      We need Visual Studio 2005/2008 project files for Windows.
    </details>
  </issue>
</issues>

Disclaimer: There are probably some limitations.

nbolton
Thank you! This was my one gripe with all the pretty printing methods. Works well with the few files I tried.
Ian