views:

58

answers:

2

I'm using jQuery to load arbitrary XML strings (fragments of a larger document) into the browser DOM and manipulate them, then using XMLSerializer to load them back to strings and send them back to the server, where they are processed (by python and lxml) and re-integrated into a full XML document.

The XML starts and ends in a git repository. I've found that the attributes on elements processed by XMLSerializer are reversed in order, resulting in spurious changes showing up in my repository, like so:

- <literal><token kind="w" id="en-us-esv-xeaugcbzgo">sent</token><token kind="s" id="en-us-esv-xeaugcbzgw"> </token></literal>
+ <literal><token id="en-us-esv-xeaugcbzgo" kind="w">sent</token><token id="en-us-esv-xeaugcbzgw" kind="s"> </token></literal>

This isn't a bug with any of the tools I'm using. Of course, the order of attributes on an xml element aren't supposed to matter. But, because git is a line-oriented SCM, these spurious and insignificant changes will distract from the actual substantive changes that I want to track.

The Question: Is there a way to keep the serializer from re-ordering my attributes? Alternately, do any tools exist to specify/constrain the ordering of attributes?

Edited above for clarity: I am aware that, according to the XML Specification, "the order of attribute specifications in a start-tag or empty-element tag is not significant": http://www.w3.org/TR/REC-xml/#sec-starttags. Suffice it to say, the ordering of attributes is significant to me. :)

A: 

If this matters, the bug isn't in re-ordering the attributes, but in it mattering. Let it order them however it wants, and fix the bug.

Edit:

Wait a minute. Why is this being put into a repository? If it's output rather than source, then its value in a repository is as a non-editted resource rather than as source, and its stored as a convenience. Otherwise, why are you letting a computer process change it?

This is analogous to putting a binary into a repository, with the same reasons why that's often bad, and the same reasons for making exceptions.

Jon Hanna
SCMs are typically line-oriented, and unfortunately, AFAIK there aren't any SCM systems that "understand" XML to the point of being able to ignore things that don't matter in XML. For that matter, defining whether two XML files are "the same" can depend on the semantics of the data.
Jim Garrison
@Jim, well then XML is not the output to use.
Jon Hanna
This isn't a bug, merely an issue of convenience: the re-ordering of attributes obscures other, meaningful changes that are being applied to the data. Whether or not XML is the correct output, or whether it belongs in a repository or not are outside the scope of the question.
David Eyk
A: 

I've taken the direction @Tomalak suggested and am "fixing" the order server-side. Thankfully, the original order was alphabetical, and the order produced by XMLSerializer is reverse alphabetical. My server-side XML tool, lxml, maintains document attribute order, so reversing the order is simple:

xmls = json.loads(self.data['xmls'])
out = []
for xml in xmls:
    # DOM adds an XHTML namespace... silly DOM.
    xml = xml.replace('xmlns="http://www.w3.org/1999/xhtml"', '')
    tree = ET.fromstring(xml)
    for el in tree.xpath('//*'):
        attrs = dict(el.attrib)
        keys = el.attrib.keys()  # el.attrib preserves attribute order
        keys.reverse()  # But the browser DOM has reversed that order.
        # Put them back in the order we want.
        el.attrib.clear()
        for k in keys:
            el.attrib[k] = attrs[k]
    out.append(ET.tostring(tree, encoding=unicode))

My line-based diffs are useful again!

David Eyk