tags:

views:

129

answers:

2

I have a project where the main file we are dealing with is an old XML file where the creator made a very unstructured DTD (All elements are optional, and can occur 0 or more times. Even better the application which reads the file actually expects many of the values as required). I have created an XSD based upon known application requirements, and moved the unordered element lists into sequences in the XSD.

Is there an simple transformation process (e.g. XSLT) which can take an old XML file, and order its elements in a specified way so that we can use the new XSD to validate it?

Example:

<Top>
  <A/>
  <D/>
  <B/>
  <C/>
  <A/>
</TOP>

INTO

<Top>
  <A/>
  <A/>
  <B/>
  <C/>
  <D/>
</TOP>

Also children also might have elements which need to also be sorted into the new sequence expected ordering. Thanks!

+1  A: 

I'm assuming you don't want to alphabetize your elements, but rather put them in the order you specify. Try this--you'll need an XSLT processor (e.g. Saxon), and save this file as a *.xsl.

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
<xsl:output indent="yes" method="xml" version="1.0" />

<xsl:template match="Top">
   <xsl:copy>
      <xsl:for-each select="A">
         <xsl:copy-of select="." />  
      </xsl:for-each> 
      <xsl:for-each select="B">
         <xsl:copy-of select="." />  
      </xsl:for-each>  
      <xsl:for-each select="C">
         <xsl:copy-of select="." />  
      </xsl:for-each>  
      <xsl:for-each select="D">
         <xsl:copy-of select="." />  
      </xsl:for-each>
   </xsl:copy>  
</xsl:template>

</xsl:stylesheet>

BIG caveat though: XML is case-sensitive, so your <Top> and </TOP> tags don't match, so you don't have well-formed XML, so the XSLT processor will throw an error and quit.

<xsl:copy-of> copies the matched element and ALL its children (incl. attributes). To re-order deeper levels, you can replace xsl:copy-of with xsl:copy and then call a similar template from there to output the next level in order.

carillonator
Very clear and helpful. This is great!
Scanningcrew
That's a very... weird way to do this. Why not just `<xsl:copy-of select="A"/><xsl:copy-of select="B"/>...`?
Pavel Minaev
+4  A: 

Instead of specifying all the elements to order within a template, you may use in a more declarative way a "lookup list" embedded in the stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
   xmlns:my="my-namespace" 
   exclude-result-prefixes="my">
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <my:Top>
    <my:A>
      <my:AA/>
      <my:AB/>
      <my:AC/>
    </my:A>
    <my:B/>
    <my:C/>
    <my:D/>
  </my:Top>
  <xsl:template match="my:*">
    <xsl:param name="source"/>
    <xsl:variable name="current-lookup-elem" select="current()"/>
    <xsl:for-each select="$source/*[name()=local-name($current-lookup-elem)]">
      <xsl:copy>
        <xsl:apply-templates select="$current-lookup-elem/*">
          <xsl:with-param name="source" select="current()"/>
        </xsl:apply-templates>
        <xsl:copy-of select="text()"/>
      </xsl:copy>
    </xsl:for-each>
  </xsl:template>
  <xsl:template match="/Top">
    <xsl:apply-templates select="document('')/*/my:*">
      <xsl:with-param name="source" select="/"/>
    </xsl:apply-templates>
  </xsl:template>
</xsl:stylesheet>

This sample:

<Top>
  <A>
    <AC/>
    <AA/>
  </A>
  <D/>
  <B/>
  <C>yyy</C>
  <A>
    <AB/>
    <AC/>
    <AA>xxx</AA>
  </A>
</Top>

will return:

<Top>
    <A>
     <AA>xxx</AA>
     <AC/>
    </A>
    <A>
     <AA/>
     <AB/>
     <AC/>
    </A>
    <B/>
    <C>yyy</C>
    <D/>
</Top>
Erlock
I like this idea, going to try and incorporate it as well.
Scanningcrew
I tried doing this method but it is only copying the values (my bad example, but each element has #PCDATA at is not empty). Is there a way to get it to print the element tags as well?
Scanningcrew
Add <xsl:copy-of select="text()"/> just before </xsl:copy>. I've edited the code above to include it.
Erlock
Nice thought, but he says he has an XSD - the ultimate solution here would be to drive off that :)
Pavel Minaev
Sure, but the ultimate solution you're talking about goes far beyond a mere code snippet like this one. :)
Erlock
Maybe it's the netbeans processor (XSL Transform...) that I am using, but it seems as though the template inside the stylesheet isnt being read in and processed with the first apply-template with input. Even with this change I still get an empty output file using your exact XSL example and input example.
Scanningcrew
Weird... I tested it with two different processors, and it worked well. Maybe you should test it with Saxon or Xalan.
Erlock
Erlock
Scanningcrew