tags:

views:

1105

answers:

3

How do we filter an xml document based on another xml document. I have to remove all the elements which are not there in the lookup xml. Both the input xml and lookup xml has the same root elements, we are using XSLT 1.0.

Ex Input

<Root>
    <E1 a="1">V1</E1>
    <E2>V2</E2>
    <E3>V3</E3>
    <E5>
       <SE51>SEV1</SE51> 
       <SE52>SEV2</SE52> 
    </E5>
    <E6>
       <SE61>SEV3</SE61> 
       <SE62>SEV4</SE62> 
    </E6>
</Root>

Filter Xml

<Root>
    <E1 a="1"></E1>
    <E2></E2>
    <E5>
       <SE51></SE51> 
       <SE52></SE52> 
    </E5>
</Root>

Expected Output

<Root>
    <E1 a="1">V1</E1>
    <E2>V2</E2>
    <E5>
       <SE51>SEv1</SE51> 
       <SE52>SEV2</SE52> 
    </E5>
</Root>
A: 

Hmmm, you're sort of talking about merging (assuming your filter doc is variable). There's a couple of possibilities which vary with the language you're implementing all of this in. Could you provide more info about the app?

Otherwise I suggest a quick google on "xslt +merge" and see if some result there grabs you.

annakata
Our input xml docu comes from a different source, but we are not interested on all the details. And also we store the xml in our db, so we decided to remove all the elements which are not necessasary for us. We are reducing the actual input to a document with less details but keep the same structure
gk
If there's a co-dependant algorithm then you're looking at a merge, if there's a dependant one it's a simple transform, if there's neither it can't be done.
annakata
A: 

Based on what I've done in the past when faced with similar problems I'd suggest:

  • Write a transformation in XSLT to which consumed the "filter XML" and produce a transformation (also in XSLT).
  • Run the resulting XSLT on your input.

It sounds (and is) ugly, but I've found this easier than trying to interpret the filter description on the fly while transforming the input.

bendin
+3  A: 

Here is the required transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:z="inline:text.xml"
 exclude-result-prefixes="z"
 >
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <z:filter>
     <Root>
      <E1 a="1"></E1>
      <E2></E2>
      <E5>
       <SE51></SE51>
       <SE52></SE52>
      </E5>
     </Root>
    </z:filter>

    <xsl:variable name="vFilter" select=
     "document('')/*/z:filter"/>

    <xsl:template match="/">
      <xsl:apply-templates select="*[name()=name($vFilter/*)]">
        <xsl:with-param name="pFiltNode" select="$vFilter/*"/>
      </xsl:apply-templates>
    </xsl:template>

    <xsl:template match="*">
      <xsl:param name="pFiltNode"/>

      <xsl:copy>
       <xsl:copy-of select="@*"/>

       <xsl:for-each select="text() | *">
         <xsl:choose>
           <xsl:when test="self::text()">
             <xsl:copy-of select="."/>
           </xsl:when>
           <xsl:otherwise>
            <xsl:variable name="vFiltNode"
                 select="$pFiltNode/*[name()=name(current())]"/>

            <xsl:apply-templates select="self::node()[$vFiltNode]">
              <xsl:with-param name="pFiltNode" select="$vFiltNode"/>
            </xsl:apply-templates>
           </xsl:otherwise>
         </xsl:choose>
       </xsl:for-each>
      </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the following XML document (the original one plus the addition of <SE511>SEV11</SE511> to demonstrate that the filtering works on any level:

<Root>
    <E1 a="1">V1</E1>
    <E2>V2</E2>
    <E3>V3</E3>
    <E5>
     <SE51>SEV1</SE51>
     <SE511>SEV11</SE511>
     <SE52>SEV2</SE52>
    </E5>
    <E6>
     <SE61>SEV3</SE61>
     <SE62>SEV4</SE62>
    </E6>
</Root>

the wanted result is produced:

<Root>
    <E1 a="1">V1</E1>
    <E2>V2</E2>
    <E3>V3</E3>
    <E5>
     <SE51>SEV1</SE51>
     <SE511>SEV11</SE511>
     <SE52>SEV2</SE52>
    </E5>
    <E6>
     <SE61>SEV3</SE61>
     <SE62>SEV4</SE62>
    </E6>
</Root>

Do notice the following details of this solution:

  1. Templates are applied only to elements that have a matching node in the filter-document and also to all text nodes of such elements.
  2. The template that matches an element is passed as parameter the corresponding node in the filter-document.
  3. When applying templates to an element-child, its corresponding node is found and passed as the expected parameter.

Do enjoy!

Dimitre Novatchev