views:

36

answers:

3

Sometimes XML files needs to be stored in some VCS. Such files are often edited using GUI tools which can reorder the elements each times as they want.

Also VCS merging is usually line-oriented, and often XML files either looks likes one long line or fully indented like

<foo>
    <bar>
        <name>
            n3
        </name>
        <value>
            qqq3
        </value>
    </bar>
    <bar>
        <name>
            n2
        </name>
        <value>
            qqq2
        </value>
    </bar>
</foo>

, while they should look like

<foo>
    <bar>  <name> n2 </name>  <value> qqq2 </value> </bar>
    <bar>  <name> n3 </name>  <value> qqq3 </value> </bar>
</foo>

(e.g. "partially indented") to be more human readable/editable, compact. One simple logical unit should occupy one line.

Even if somebody converts XML file to such nice format, someone else will edit it in GUI tool that will reorder and reintent everything and it will be bad (unreadable and VCS will report massive changes despite of there are almost no actual changes).

Is there ready made XSLT transformation (or other program) that converts all XML files to some unified format (e.g. sorts (if element order do not matter) and unifies whitespace) and where I can specify which elements should be oneliners?

For example, if I can specify such transformation as filter in .gitattributes and git will automatically handle this.

A: 

Yes, there are XML prettyprinters; I always use xmllint myself.

reinierpost
I also sometimes use `xmllint --format`, but 1. I can't tune indenting. 2. It just preserves whitespace in text nodes, 3. No sorting.
Vi
Well, it doesn't know it can sort so that shouldn't be surprising.Apparently you need a formatter that is specific to the schema you're using. For that I'd probably write a Perl script based on XML::LibXML2 (an interface to the library behind xmllint) but you can basically use anything.
reinierpost
@reinierpost, Already implemented it in xslt.
Vi
@Vi: so I noticed, but the idea of this site is that there can be multiple useful answers to the same question.
reinierpost
+1  A: 

I did not test in every XSLT processor (In fact, I only tested this in MSXSL):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output omit-xml-declaration="yes"/>
    <xsl:template match="@*|node()" name="identity">
        <xsl:if test="self::bar">
            <xsl:text>&#xA;</xsl:text>
        </xsl:if>
        <xsl:copy>
            <xsl:apply-templates select="@*|node()">
            <xsl:sort select="normalize-space(name)"/>
            </xsl:apply-templates>
            <xsl:if test="self::foo">
                <xsl:text>&#xA;</xsl:text>
            </xsl:if>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="text()">
        <xsl:value-of select="normalize-space(.)"/>
    </xsl:template>
</xsl:stylesheet>

Result:

<foo>
<bar><name>n2</name><value>qqq2</value></bar>
<bar><name>n3</name><value>qqq3</value></bar>
</foo>

Note: XML serialization may vary. If this is the case, preserve the logic and serialize as TEXT (You must simulate XML serialization by output opening and closing tags as well as attributes)

Edit: Minor change in order to properly sort diferent serializated input.

Alejandro
Thanks, I've written more usiversal XSLT based from parts of yours
Vi
A: 

Created my own sort-indenter based on http://www.dpawson.co.uk/xsl/sect2/pretty.html and Alejandro's answer: http://vi-server.org/vi/sortindent.xsl. Mirrored here:

<!-- Change 'oneliner' to the name of element you want to see as one line -->
<!-- Remove 'xsl:sort' element if you don't want sorting -->
<!-- http://stackoverflow.com/questions/3157658/converting-xml-files-to-be-human-editable-and-managable-by-vcs/3160818#3160818 -->

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
   <xsl:output method="xml"/>
   <xsl:param name="indent-increment" select="'   '" />

   <xsl:template match="*">
      <xsl:param name="skip_indent" select="name()='oneliner' or name()='another_oneliner'"/> 
      <xsl:param name="indent" select="'&#xA;'"/>

      <xsl:if test="not($skip_indent)">
      <xsl:value-of select="$indent"/>

      <xsl:copy>
        <xsl:copy-of select="@*"/>
        <xsl:apply-templates>
        <xsl:with-param name="indent" select="concat($indent, $indent-increment)"/>
        <xsl:sort select="@*|node()"/> 
        </xsl:apply-templates>
        <xsl:if test="*">
          <xsl:value-of select="$indent"/>
        </xsl:if>

      </xsl:copy>

      </xsl:if>
      <xsl:if test="$skip_indent">
         <xsl:value-of select="$indent"/>
         <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates>
           <xsl:with-param name="indent" select="' '"/>
           <xsl:with-param name="skip_indent" select="1"/>
           <xsl:sort select="@*|node()"/> 
            </xsl:apply-templates>
         </xsl:copy>
      </xsl:if>
   </xsl:template>

   <xsl:template match="comment()|processing-instruction()">
      <xsl:copy />
   </xsl:template>

   <xsl:template match="text()">
       <xsl:param name="skip_indent" select="0"/> 
       <xsl:if test="$skip_indent">
           <xsl:value-of select="normalize-space(.)"/>
       </xsl:if>
       <xsl:if test="not($skip_indent)">
           <xsl:if test="not(normalize-space(.)='')">
              <xsl:value-of select="."/>
           </xsl:if>
       </xsl:if>
   </xsl:template>


</xsl:stylesheet>

Now "equivalent" transformations of the original XML file (except of reordering of attributes) maps to the same resulting XML, and it is fine formatted, and I can force some elements to be oneliners.

Vi
@Vi: I do not believe much in general solutions ... Furthermore, suppose the following change in your input document: `<name>n2</name>`. Execute your transformation and tell me if this is the desired result.
Alejandro
@Alejandro, If remove sorting, it will be just indenter with ability to "turn off" indenting for some elements. Sorting can be tweaked.There is already `<name>n2</name>`, don't know what to change.
Vi
@Vi: What I meant was that you should try to change your input document in order to test your stylesheet. As example: normalize the space in the text node of the second `name` element and run your stylesheet again. Does it output the desired result?
Alejandro
@Alejandro, Thanks, there is a problem. Need to strip whitespace from 'oneliners'. Fixing...
Vi