views:

888

answers:

2

I'm trying to canonicalize the representation of some XML data by sorting each element's attributes by name (not value). The idea is to keep textual differences minimal when attributes are added or removed and to prevent different editors from introducing equivalent variants. These XML files are under source control and developers are wanting to diff the changes without resorting to specialized XML tools.

I was surprised to not find an XSL example of how to this. Basically I want just the identity transform with sorted attributes. I came up with the following with seems to work in all my test cases:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
  <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
  <xsl:template match="*|/|text()|comment()|processing-instruction()">
    <xsl:copy>
    <xsl:for-each select="@*">
        <xsl:sort select="name(.)"/>
        <xsl:copy/>
      </xsl:for-each>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

As a total XSL n00b I would appreciate any comments on style or efficiency. I thought it might be helpful to post it here since it seems to be at least not a common example.

+1  A: 

Well done for solving the problem. As I assume you know the order or attributes is unimportant for XML parsers so the primary benefit of this exercise is for humans - a machine will re-order them on input or output in unpredictable ways.

Canonicalization in XML is not trivial and you would be well advised to use the canonicalizer provided with any reasonable XML toolkit rather than writing your own.

peter.murray.rust
Do XSL transformer guarantee that the order you specify for attributes is the order written out?
Kathy Van Stone
XML tools have no guarantee that any XML output preserves attribute order even if you try to construct it as above. Remember also that you cannot even guarantee what symbol is used for quoting the values.Lexical comparison of XML is usually a poor idea.
peter.murray.rust
+3  A: 

With xslt being a functional language doing a for-each might often be the easiest path for us humans but not the most efficient for XSLT processors since they cannot fully optimize the call.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
  <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
  <xsl:template match="*">
    <xsl:copy>
      <xsl:apply-templates select="@*">
        <xsl:sort select="name()"/>
      </xsl:apply-templates>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="@*|comment()|processing-instruction()">
    <xsl:copy />     
  </xsl:template>
</xsl:stylesheet>

This is totally trivial in this regards though and as a "XSL n00b" i think you solved the problem very well indeed.

Martijn Laarman