views:

73

answers:

3

Been using XML for ages now for data storage & transfer, but have never had to validate or transform it. Currently starting a new project and making some design decisions and need to know some rudimentary things about XSL & Schemas.

Our XML is like this (excuse the boring book example :) ):

<Books>
  <Book>
    <ID>1</ID>
    <Name>Book1</Name>
    <Price>24.??</Price>
    <Country>US</Country>
  </Book>
  <Book>
    <ID>1</ID>
    <Name></Name>
    <Price>24.69</Price>
  </Book>
</Books>

Our requirements:

  1. Transformation

    a) Turn "US" into United States
    b) if Price > 20 create a new lLement <Expensive>True</Expensive>

    I'm guessing this is done with XSLT, but can anyone give me some pointers on how to achieve this?

  2. Validation

    a) is ID an integer, is Price a float (the most important job to be honest)
    b) Are all tags filled, e.g. the name tag is not filled (2nd most important)
    c) Are all tags present, e.g. Country is missing for book 2
    d) [Probably tricky] Is the ID element unique through all books? (nice to have)

From what I have read this is done with a Schema or Relax NG, but can the results of the validation be outputted to a simple HTML to display a list or errors?

e.g.
Book 1: Price "Price.??" is not float
Book 2: ID is not unique, Name empty, Country missing

Or would it be better do do these things programatically in C#? Thanks.

+3  A: 

This stylesheet:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:m="map"
exclude-result-prefixes="m">
    <xsl:key name="kTestIntID" match="Book"
             use="number(ID)=number(ID) and not(contains(ID,'.'))"
             m:message="Books with no integer ID"/>
    <xsl:key name="kTestFloatPrice" match="Book"
             use="number(Price)=number(Price) and contains(Price,'.')"
             m:message="Books with no float Price"/>
    <xsl:key name="kTestEmptyElement" match="Book"
             use="not(*[not(node())])"
             m:message="Books with empty element"/>
    <xsl:key name="kTestAllElements" match="Book"
             use="ID and Name and Price and Country"
             m:message="Books with missing element"/>
    <xsl:key name="kBookByID" match="Book" use="ID"/>
    <m:map from="US" to="United States"/>
    <m:map from="CA" to="Canada"/>
    <xsl:variable name="vCountry" select="document('')/*/m:map"/>
    <xsl:variable name="vKeys" select="document('')/*/xsl:key/@name
                                           [starts-with(.,'kTest')]"/>
    <xsl:variable name="vTestNotUniqueID"
                  select="*/*[key('kBookByID',ID)[2]]"/>
    <xsl:template match="/" name="validation">
        <xsl:param name="pKeys" select="$vKeys"/>
        <xsl:param name="pTest" select="$vTestNotUniqueID"/>
        <xsl:param name="pFirst" select="true()"/>
        <xsl:choose>
            <xsl:when test="$pTest and $pFirst">
                <html>
                    <body>
                        <xsl:if test="$vTestNotUniqueID">
                            <h2>Books with no unique ID</h2>
                            <ul>
                                <xsl:apply-templates
                                 select="$vTestNotUniqueID"
                                 mode="escape"/>
                            </ul>
                        </xsl:if>
                        <xsl:variable name="vCurrent" select="."/>
                        <xsl:for-each select="$vKeys">
                            <xsl:variable name="vKey" select="."/>
                            <xsl:for-each select="$vCurrent">
                                <xsl:if test="key($vKey,'false')">
                                    <h2>
                                        <xsl:value-of
                                         select="$vKey/../@m:message"/>
                                    </h2>
                                    <ul>
                                        <xsl:apply-templates
                                         select="key($vKey,'false')"
                                         mode="escape"/>
                                    </ul>
                                </xsl:if>
                            </xsl:for-each>
                        </xsl:for-each>
                    </body>
                </html>
            </xsl:when>
            <xsl:when test="$pKeys">
                <xsl:call-template name="validation">
                    <xsl:with-param name="pKeys"
                     select="$pKeys[position()!=1]"/>
                    <xsl:with-param name="pTest"
                     select="key($pKeys[1],'false')"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:apply-templates/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
    <xsl:template match="Book" mode="escape">
        <li>
            <xsl:call-template name="escape"/>
        </li>
    </xsl:template>
    <xsl:template match="*" name="escape" mode="escape">
        <xsl:value-of select="concat('&lt;',name(),'&gt;')"/>
        <xsl:apply-templates mode="escape"/>
        <xsl:value-of select="concat('&lt;/',name(),'&gt;')"/>
    </xsl:template>
    <xsl:template match="text()" mode="escape">
        <xsl:value-of select="normalize-space()"/>
    </xsl:template>

    <!-- Up to here, rules for validation.
         From here, rules for transformation -->

    <xsl:template match="@*|node()" name="identity">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="Country/text()">
        <xsl:variable name="vMatch"
                      select="$vCountry[@from=current()]"/>
        <xsl:choose>
            <xsl:when test="$vMatch">
                <xsl:value-of select="$vMatch/@to"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="."/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
    <xsl:template match="Price[. > 20]">
        <xsl:call-template name="identity"/>
        <Expensive>True</Expensive>
    </xsl:template>
</xsl:stylesheet>

With your input, output:

<html>
<body>
<h2>Books with no unique ID</h2>
<ul>
<li>&lt;Book&gt;&lt;ID&gt;1&lt;/ID&gt;&lt;Name&gt;Book1&lt;/Name&gt;&lt;Price&gt;24.??&lt;/Price&gt;&lt;Country&gt;US&lt;/Country&gt;&lt;/Book&gt;</li>
<li>&lt;Book&gt;&lt;ID&gt;1&lt;/ID&gt;&lt;Name&gt;&lt;/Name&gt;&lt;Price&gt;24.69&lt;/Price&gt;&lt;/Book&gt;</li>
</ul>
<h2>Books with no float Price</h2>
<ul>
<li>&lt;Book&gt;&lt;ID&gt;1&lt;/ID&gt;&lt;Name&gt;Book1&lt;/Name&gt;&lt;Price&gt;24.??&lt;/Price&gt;&lt;Country&gt;US&lt;/Country&gt;&lt;/Book&gt;</li>
</ul>
<h2>Books with empty element</h2>
<ul>
<li>&lt;Book&gt;&lt;ID&gt;1&lt;/ID&gt;&lt;Name&gt;&lt;/Name&gt;&lt;Price&gt;24.69&lt;/Price&gt;&lt;/Book&gt;</li>
</ul>
<h2>Books with missing element</h2>
<ul>
<li>&lt;Book&gt;&lt;ID&gt;1&lt;/ID&gt;&lt;Name&gt;&lt;/Name&gt;&lt;Price&gt;24.69&lt;/Price&gt;&lt;/Book&gt;</li>
</ul>
</body>
</html>

With proper input:

<Books>
    <Book>
        <ID>1</ID>
        <Name>Book1</Name>
        <Price>19.50</Price>
        <Country>US</Country>
    </Book>
    <Book>
        <ID>2</ID>
        <Name>Book2</Name>
        <Price>24.69</Price>
        <Country>CA</Country>
    </Book>
</Books>

Output:

<Books>
    <Book>
        <ID>1</ID>
        <Name>Book1</Name>
        <Price>19.50</Price>
        <Country>United States</Country>
    </Book>
    <Book>
        <ID>2</ID>
        <Name>Book2</Name>
        <Price>24.69</Price>
        <Expensive>True</Expensive>
        <Country>Canada</Country>
    </Book>
</Books>

Note: Ussing keys for performance. This is proof of concept. In real life, the XHTML output should be wrapped into an xsl:message instruction. From http://www.w3.org/TR/xslt#message

The xsl:message instruction sends a message in a way that is dependent on the XSLT processor. The content of the xsl:message instruction is a template. The xsl:message is instantiated by instantiating the content to create an XML fragment. This XML fragment is the content of the message.

NOTE:An XSLT processor might implement xsl:message by popping up an alert box or by writing to a log file.

If the terminate attribute has the value yes, then the XSLT processor should terminate processing after sending the message. The default value is no.

Edit: Compacting code and addressing country map issue.

Edit 2: In real life, with big XML documents and more enterprice tools, the best approach would be to run the transformation with XSLT 2.0 schema-aware processor for validating, or run validation independly with well-know schema validators. If for some reason these choices aren't aviable, don't go with my proof-of-concept answer because having keys for each validation rule make cause a lot of memory use for big documents. The better way for last case, is to add rules to catch validation errors ending transformation with message. As example, this stylesheet:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:m="map"
exclude-result-prefixes="m">
    <xsl:key name="kIDByValue" match="ID" use="."/>
    <m:map from="US" to="United States"/>
    <m:map from="CA" to="Canada"/>
    <xsl:variable name="vCountry" select="document('')/*/m:map"/>
    <xsl:template name="location">
        <xsl:param name="pSteps" select="ancestor-or-self::*"/>
        <xsl:if test="$pSteps">
            <xsl:call-template name="location">
                <xsl:with-param name="pSteps"
                                select="$pSteps[position()!=last()]"/>
            </xsl:call-template>
            <xsl:value-of select="concat('/',
                                         name($pSteps[last()]),
                                         '[',
                                         count($pSteps[last()]/
                                               preceding-sibling::*
                                               [name()=
                                                name($pSteps[last()])])
                                         +1,
                                         ']')"/>
        </xsl:if>
    </xsl:template>
    <xsl:template match="ID[not(number()=number() and not(contains(.,'.')))]">
        <xsl:message terminate="yes">
            <xsl:text>No integer ID at </xsl:text>
            <xsl:call-template name="location"/>
        </xsl:message>
    </xsl:template>
    <xsl:template match="Price[not(number()=number() and contains(.,'.'))]">
        <xsl:message terminate="yes">
            <xsl:text>No float Price at </xsl:text>
            <xsl:call-template name="location"/>
        </xsl:message>
    </xsl:template>
    <xsl:template match="Book/*[not(node())]">
        <xsl:message terminate="yes">
            <xsl:text>Empty element at </xsl:text>
            <xsl:call-template name="location"/>
        </xsl:message>
    </xsl:template>
    <xsl:template match="Book[not(ID and Name and Price and Country)]">
        <xsl:message terminate="yes">
            <xsl:text>Missing element at </xsl:text>
            <xsl:call-template name="location"/>
        </xsl:message>
    </xsl:template>
    <xsl:template match="ID[key('kIDByValue',.)[2]]">
        <xsl:message terminate="yes">
            <xsl:text>Duplicate ID at </xsl:text>
            <xsl:call-template name="location"/>
        </xsl:message>
    </xsl:template>
    <!-- Up to here, rules for validation.
         From here, rules for transformation -->
    <xsl:template match="@*|node()" name="identity">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="Country/text()">
        <xsl:variable name="vMatch"
                      select="$vCountry[@from=current()]"/>
        <xsl:choose>
            <xsl:when test="$vMatch">
                <xsl:value-of select="$vMatch/@to"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="."/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
    <xsl:template match="Price[. > 20]">
        <xsl:call-template name="identity"/>
        <Expensive>True</Expensive>
    </xsl:template>
</xsl:stylesheet>

With your input, this message stops the transformation:

Duplicate ID ar /Books[1]/Book[1]/ID[1]

With proper input, outputs the same as before.

Alejandro
Holy cow, I won't pretend to understand all of that yet (looks like XSLT is an aquired artform!), but it does exactly what we want in not too many lines!One additional question: if the country code is a lookup list, e.g. US=United States, CA=Canada, would you still do it the same way?
Andrew White
@Andre White: See my edit addressing your request and a more real life sugestion.
Alejandro
A: 

Here is the RelaxNG schema:

<grammar xmlns="http://relaxng.org/ns/structure/1.0"
         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"&gt;

  <start>
    <element name="Books">
      <zeroOrMore>
        <element name="Book">
          <element name="ID"><data type="ID"/></element>
          <element name="Name"><text/></element>
          <element name="Price"><data type="decimal"/></element>
          <element name="Country"><data type="NMTOKEN"/></element>
        </element>
      </zeroOrMore>
    </element>
  </start>

</grammar>

and this is the XML Schema version. (I think.)

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
  <xs:element name="Books">
    <xs:complexType>
      <xs:sequence>
        <xs:element minOccurs="0" maxOccurs="unbounded" ref="Book"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="Book">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="ID"/>
        <xs:element ref="Name"/>
        <xs:element ref="Price"/>
        <xs:element ref="Country"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="ID" type="xs:ID"/>
  <xs:element name="Name" type="xs:string"/>
  <xs:element name="Price" type="xs:decimal"/>
  <xs:element name="Country" type="xs:NMTOKEN"/>
</xs:schema>

Couple of things to note here:

  • The ID simple type will cause validators to check for multiple occurrences of the identifier, and complain if there are any. A disadvantage of using ID tags however is that they cannot start with a number. So, A1, A2, ... An would be fine, but IDs like 1, 2, ...., n would be considered invalid anyway.
  • The price has been set to be of type decimal. Float is never a proper type for financial numbers, because of rounding errors.

Running this through xmllint with the original XML document as input (with modified identifiers) gives:

wilfred$ xmllint --noout --relaxng ./books.rng ./books.xml
./books.xml:5: element Price: Relax-NG validity error : Type decimal doesn't allow value '24.??'
./books.xml:5: element Price: Relax-NG validity error : Error validating datatype decimal
./books.xml:5: element Price: Relax-NG validity error : Element Price failed to validate content
./books.xml:8: element Book: Relax-NG validity error : Expecting an element , got nothing
./books.xml fails to validate
Wilfred Springer
A: 

On general XSL education, you may find useful an XSL Primer I wrote some years back. It's not current on all the latest trends, but covers the basics of how the XML document is processed.

Steve