views:

2688

answers:

4

Hello!

I have the following code.

from xml.dom.minidom import Document

doc = Document()

root = doc.createElement('root')
doc.appendChild(root)
main = doc.createElement('Text')
root.appendChild(main)

text = doc.createTextNode('Some text here')
main.appendChild(text)

print doc.toprettyxml(indent='\t')

The result is:

<?xml version="1.0" ?>
<root>
    <Text>
     Some text here
    </Text>
</root>

This is all fine and dandy, but what if I want the output to look like this?

<?xml version="1.0" ?>
<root>
    <Text>Some text here</Text>
</root>

Can this easily be done?

Orjanp...

+3  A: 

Can this easily be done?

It depends what exact rule you want, but generally no, you get little control over pretty-printing. If you want a specific format you'll usually have to write your own walker.

The DOM Level 3 LS parameter format-pretty-print in pxdom comes pretty close to your example. Its rule is that if an element contains only a single TextNode, no extra whitespace will be added. However it (currently) uses two spaces for an indent rather than four.

>>> doc= pxdom.parseString('<a><b>c</b></a>')
>>> doc.domConfig.setParameter('format-pretty-print', True)
>>> print doc.pxdomContent
<?xml version="1.0" encoding="utf-16"?>
<a>
  <b>c</b>
</a>

(Adjust encoding and output format for whatever type of serialisation you're doing.)

If that's the rule you want, and you can get away with it, you might also be able to monkey-patch minidom's Element.writexml, eg.:

>>> from xml.dom import minidom
>>> def newwritexml(self, writer, indent= '', addindent= '', newl= ''):
...     if len(self.childNodes)==1 and self.firstChild.nodeType==3:
...         writer.write(indent)
...         self.oldwritexml(writer) # cancel extra whitespace
...         writer.write(newl)
...     else:
...         self.oldwritexml(writer, indent, addindent, newl)
... 
>>> minidom.Element.oldwritexml= minidom.Element.writexml
>>> minidom.Element.writexml= newwritexml

All the usual caveats about the badness of monkey-patching apply.

bobince
Thanks. since I'm not going to be manually reading these files so often, and this is not easily done, I guess I just leave it the way it is. And focus my attention to more important things, like getting this load/save to xml working.Orjanp...
Orjanp
A: 

The pyxml package offers a simple solution to this by using the xml.dom.ext.PrettyPrint() function. It can also print to a file descriptor.

But the pyxml package is no longer maintained.

Oerjan Pettersen

Orjanp
+1  A: 

I was looking for exactly the same thing, and I came across this post. (the indenting provided in xml.dom.minidom broke another tool that I was using to manipulate the XML, and I needed it to be indented) I tried the accepted solution with a slightly more complex example and this was the result:

In [1]: import pxdom

In [2]: xml = '<a><b>fda</b><c><b>fdsa</b></c></a>'

In [3]: doc = pxdom.parseString(xml)

In [4]: doc.domConfig.setParameter('format-pretty-print', True)

In [5]: print doc.pxdomContent
<?xml version="1.0" encoding="utf-16"?>
<a>
  <b>fda</b><c>
    <b>fdsa</b>
  </c>
</a>

The pretty printed XML isn't formatted correctly, and I'm not too happy about monkey patching (i.e. I barely know what it means, and understand it's bad), so I looked for another solution.

I'm writing the output to file, so I can use the xmlindent program for Ubuntu ($sudo aptitude install xmlindent). So I just write the unformatted to the file, then call the xmlindent from within the python program:

from subprocess import Popen, PIPE
Popen(["xmlindent", "-i", "2", "-w", "-f", "-nbe", file_name], 
      stderr=PIPE, 
      stdout=PIPE).communicate()

The -w switch causes the file to be overwritten, but annoyingly leaves a named e.g. "myfile.xml~" which you'll probably want to remove. The other switches are there to get the formatting right (for me).

P.S. xmlindent is a stream formatter, i.e. you can use it as follows:

cat myfile.xml | xmlindent > myfile_indented.xml

So you might be able to use it in a python script without writing to a file if you needed to.

markmuetz
A: 

This solution worked for me without monkey patching or ceasing to use minidom:

from xml.dom.ext import PrettyPrint
from StringIO import StringIO

def toprettyxml_fixed (node, encoding='utf-8'):
    tmpStream = StringIO()
    PrettyPrint(node, stream=tmpStream, encoding=encoding)
    return tmpStream.getvalue()

http://ronrothman.com/public/leftbraned/xml-dom-minidom-toprettyxml-and-silly-whitespace/#best-solution

Stefanie Tellex
Yes, this solution is mentioned here already. But if I'm not mistaken, this solution requires pyxml to be installed. Pyxml seems to be unmaintained at the moment. So I guess a different solution is a better one, like lxml.http://stackoverflow.com/questions/507405/python-xml-minidom-generate-textsome-text-text-element/623868#623868
Orjanp