views:

98

answers:

2

We have xml documents that contain lots of flagged nodes like isProduct, isActive , isMandatory where the node text may be True or False.

It is needed to manipulate the documents and keep their structure but convert the above nodes into a verbal representation like below:

< isProduct >True</ isProduct >   ===>   <Type>Product<Type>
< isProduct >False</ isProduct >  ===>   <Type/>

And the same for other flag nodes.

We are seeking an extensible and scalable solution that can be configured with minimum friction after deployment.

By extensible; we mean that there will be more cases; like 2 flags that represent a status; i.e. isEmployee and isCustomer is used in the document to represent 4 different named things.; hence the 4 possible combinations should only be translate into one string like "Employee", "Customer", "Customer-Employee" or "None"

By scalable; we mean that it can be used to process any XML document without a prior understanding of its schema and no restriction on the document size

We do understand that this might be done using XSLT, can we write an XSLT that an accept any document and produce the same document with additional nodes added or update ?

+1  A: 

Here's a solution in XSLT based on the identity transformation:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
  <xsl:template match="node() | @*">
    <xsl:copy>
        <xsl:apply-templates select="node() | @*"/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="isProduct">
    <xsl:choose>
      <xsl:when test=". = 'True'"><Type>Product</Type></xsl:when>
      <xsl:otherwise><Type/></xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>
mkoeller
So how is this going to handle the isEmployee node?
Mark Dickinson
Either add a similar template matching "isEmployee" or make the given template generic as in Tomalak's answer.
mkoeller
+2  A: 

Assuming an input like this:

<gizmo>
  <isProduct>True</isProduct>
  <isFoo>False</isFoo>
  <isBar>True</isBar>
</gizmo>

The generic approach would be:

<xsl:template match="gizmo">
  <xsl:copy>
    <xsl:apply-templates select="*" />
  </xsl:copy>
</xsl:template>

<xsl:template match="*[substring(local-name(), 1, 2) = 'is']">
  <Type>
    <xsl:if test=". = 'True'">
      <xsl:value-of select="substring-after(local-name(), 'is')" />
    </xsl:if>
  </Type>
</xsl:template>

Which produces:

<gizmo>
  <Type>Product</Type>
  <Type />
  <Type>Bar</Type>
</gizmo>

An even more generalized approach uses a (heavily) modified identity transform:

<!-- the identity template... well, sort of -->
<xsl:template match="node() | @*">
  <xsl:copy>
    <!-- all element-type children that begin with 'is' -->
    <xsl:variable name="typeNodes"  select="
      *[substring(local-name(), 1, 2) = 'is']
    " />

    <!-- all other children (incl. elements that don't begin with 'this ' -->
    <xsl:variable name="otherNodes" select="
      @* | node()[not(self::*) or self::*[substring(local-name(), 1, 2) != 'is']]
    " />

    <!-- identity transform all the "other" nodes -->
    <xsl:apply-templates select="$otherNodes" />

    <!-- collapse all the "type" nodes into a string -->
    <xsl:if test="$typeNodes">
      <Type>
        <xsl:variable name="typeString">
          <xsl:apply-templates select="$typeNodes" />
        </xsl:variable>
        <xsl:value-of select="substring-after($typeString, '-')" />
      </Type>
    </xsl:if>
  </xsl:copy>
</xsl:template>

<!-- this collapses all the "type" nodes into a string -->
<xsl:template match="*[substring(local-name(), 1, 2) = 'is']">
  <xsl:if test=". = 'True'">
    <xsl:text>-</xsl:text>
    <xsl:value-of select="substring-after(local-name(), 'is')" />
  </xsl:if>
</xsl:template>

<!-- prevent the output of empty text nodes -->
<xsl:template match="text()">
  <xsl:if test="normalize-space() != ''">
    <xsl:value-of select="." />
  </xsl:if>
</xsl:template>

The above takes any XML input whatsoever and outputs the same structure, only elements named <is*> are collapsed into a single <Type> node as a dash-delimited string:

<!-- in -->
<foo>
  <fancyNode />
  <gizmo>
    <isProduct>True</isProduct>
    <isFoo>False</isFoo>
    <isBar>True</isBar>
  </gizmo>
</foo>

<!-- out -->
<foo>
  <fancyNode />
  <gizmo>
    <Type>Product-Bar</Type>
  </gizmo>
</foo>
Tomalak
The copy template will also strip any attributes from XML documents.
mkoeller
Tomalak