ansaurus

Question

Answer 1

+2 A:

Use xmldiff, a python tool that figures out the differences between two similar XML files, the same way that diff does it.

superjoe30 2008-11-26 19:19:11

Answer 2

+3 A:

First normalize 2 XML, then you can compare them. I've used the following using lxml

    obj1 = objectify.fromstring(expect)
    expect = etree.tostring(obj1)        
    obj2 = objectify.fromstring(xml)
    result = etree.tostring(obj2)        
    self.assertEquals(expect, result)

Kozyarchuk 2008-11-26 19:35:15

Oh man, I had tried this and thought the attributes were ordered differently, but I looked again and I was actually just missing one in my output. Thanks for hitting me over the head.

Adam Endicott 2008-11-26 20:03:17

Heh. Slight note of caution, etree does not document any guarantee to serialise attributes in any particular order. At least the current pure-Python implementation of ElementTree does do a sort() on them, but it's not clear you can rely on this remaining so.

bobince 2008-11-26 23:19:28

Answer 3

+1 A:

If the problem is really just the whitespace and attribute order, and you have no other constructs than text and elements to worry about, you can parse the strings using a standard XML parser and compare the nodes manually. Here's an example using minidom, but you could write the same in etree pretty simply:

def isEqualXML(a, b):
    da, db= minidom.parseString(a), minidom.parseString(b)
    return isEqualElement(da.documentElement, db.documentElement)

def isEqualElement(a, b):
    if a.tagName!=b.tagName:
        return False
    if sorted(a.attributes.items())!=sorted(b.attributes.items()):
        return False
    if len(a.childNodes)!=len(b.childNodes):
        return False
    for ac, bc in zip(a.childNodes, b.childNodes):
        if ac.nodeType!=bc.nodeType:
            return False
        if ac.nodeType==ac.TEXT_NODE and ac.data!=bc.data:
            return False
        if ac.nodeType==ac.ELEMENT_NODE and not isEqualElement(ac, bc):
            return False
    return True

If you need a more thorough equivalence comparison, covering the possibilities of other types of nodes including CDATA, PIs, entity references, comments, doctypes, namespaces and so on, you could use the DOM Level 3 Core method isEqualNode. Neither minidom nor etree have that, but pxdom is one implementation that supports it:

def isEqualXML(a, b):
    da, db= pxdom.parseString(a), pxdom.parseString(a)
    return da.isEqualNode(db)

(You may want to change some of the DOMConfiguration options on the parse if you need to specify whether entity references and CDATA sections match their replaced equivalents.)

A slightly more roundabout way of doing it would be to parse, then re-serialise to canonical form and do a string comparison. Again pxdom supports the DOM Level 3 LS option ‘canonical-form’ which you could use to do this; an alternative way using the stdlib's minidom implementation is to use c14n. However you must have the PyXML extensions install for this so you still can't quite do it within the stdlib:

from xml.dom.ext import c14n

def isEqualXML(a, b):
    da, bd= minidom.parseString(a), minidom.parseString(b)
    a, b= c14n.Canonicalize(da), c14n.Canonicalize(db)
    return a==b

bobince 2008-11-26 19:56:19

Answer 4

+1 A:

Why are you examining the XML data at all?

The way to test object serialization is to create an instance of the object, serialize it, deserialize it into a new object, and compare the two objects. When you make a change that breaks serialization or deserialization, this test will fail.

The only thing checking the XML data is going to find for you is if your serializer is emitting a superset of what the deserializer requires, and the deserializer silently ignores stuff it doesn't expect.

Of course, if something else is going to be consuming the serialized data, that's another matter. But in that case, you ought to be thinking about establishing a schema for the XML and validating it.

Robert Rossney 2008-11-26 20:46:45

Yes, something else is going to be consuming the serialized data. I may get to the point of building a schema and validating it, but for now doing a string comparison of the XML is good enough.

Adam Endicott 2008-11-26 21:32:47

Answer 5

A:

The Java component dbUnit does a lot of XML comparisons, so you might find it useful to look at their approach (especially to find any gotchas that they may have already addressed).

Rob Williams 2008-11-27 00:20:52

ansaurus

tags:

views:

answers:

Comparing XML in a unit test in Python

related questions