views:

841

answers:

3

Hi there, I'm trying to read an xml file into python, pull out certain elements from the xml file and then write the results back to an xml file (so basically it's the original xml file without several elements). When I use .removeChild(source) it removes the individual elements I want to remove but leaves white space in its stead making the file very unreadable. I know I can still parse the file with all of the whitespace, but there are times when I need to manually alter the values of certain element's attributes and it makes it difficult (and annyoing) to do this. I can certainly remove the whitespace by hand but if I have dozens of these xml files that's not really feasible.

Is there a way to do .removeChild and have it remove the white space as well?

Here's what my code looks like:

dom=parse(filename)
main=dom.childNodes[0]
sources = main.getElementsByTagName("source")
for source in sources :
    name=source.getAttribute("name")
    spatialModel=source.getElementsByTagName("spatialModel")
    val1=float(spatialModel[0].getElementsByTagName("parameter")[0].getAttribute("value"))
    val2=float(spatialModel[0].getElementsByTagName("parameter")[1].getAttribute("value"))
    if angsep(val1,val2,X,Y)>=ROI :
        main.removeChild(source)
    else:
        print name,val1,val2,angsep(val1,val2,X,Y)
f=open(outfile,"write")
f.write("<?xml version=\"1.0\" ?>\n")
f.write(dom.saveXML(main))
f.close()

Thanks much for the help.

+1  A: 

If you have PyXML installed you can use xml.dom.ext.PrettyPrint()

pwdyson
A: 

I couldn't figure out how to do this using xml.dom.minidom, so I just wrote a quick function to read in the output file and remove all blank lines and then rewrite to a new file:

f = open(xmlfile).readlines()
w = open('src_model.xml','w')
empty=re.compile('^$')
for line in open(xmlfile).readlines():
    if empty.match(line):
        continue
    else: 
        w.write(line)

This works good enough for me :)

Jamie
A: 

… for searching ppl:

This funny snippet

skey = lambda x: getattr(x, "tagName", None)
mainnode.childNodes = sorted( 
  [n for n in mainnode.childNodes if n.nodeType != n.TEXT_NODE],
  cmp=lambda x, y: cmp(skey(y), skey(x)))

removes all text nodes (and, also, reverse sorts them by tagname).

I.e. you can (recursively) do tr.childNodes = [recurseclean(n) for n in tr.childNodes if n.nodeType != n.TEXT_NODE] to remove all text nodes

Or you might want to do something like … if n.nodeType != n.TEXT_NODE or not re.match(r'^[:whitespace:]*$', n.data, re.MULTILINE) (did't try that one myself) if you need text nodes with some data. Or something more complex to leave text inside specific tags.

After that tree.toprettyxml(…) will return well-formatted XML text.