views:

32

answers:

1

Hi, I have the following xml:

<?xml version="1.0" encoding="UTF-8"?>   
<SomeName>   
  <NodeA> 
    DataA 
 </NodeA>   
 <NodeA> 
    DataB 
 </NodeA>   
  <NodeA> 
    DataA 
 </NodeA>   
  <AnotherNode> 
    DataA 
 </AnotherNode> 
  <AnotherNode> 
    DataC 
 </AnotherNode> 
  <AnotherNode> 
    DataC 
 </AnotherNode> 
 <SingleNode> 
    DataA 
 </SingleNode> 

And I need to parse through the xml removing any nodes that have the same name as well as the same content. The problem is, the duplicates are more or less scattered throughout the document and I don't have a list of nodenames or specific contents I want to get rid of.

Basically my output should look like this:

<?xml version="1.0" encoding="UTF-8"?>   
<SomeName>   
  <NodeA> 
    DataA 
 </NodeA>   
 <NodeA> 
    DataB 
 </NodeA>   
 <AnotherNode> 
    DataA 
 </AnotherNode> 
  <AnotherNode> 
    DataC 
 </AnotherNode>  
 <SingleNode> 
    DataA 
 </SingleNode> 

Anyone got some clever XSLT?

Thanks!

+1  A: 

With proper input:

<SomeName>
    <NodeA>DataA</NodeA>
    <NodeA>DataB</NodeA>
    <NodeA>DataA</NodeA>
    <AnotherNode>DataA</AnotherNode>
    <AnotherNode>DataC</AnotherNode>
    <AnotherNode>DataC</AnotherNode>
    <SingleNode>DataA</SingleNode>
</SomeName>

This stylesheet:

<xsl:stylesheet
          xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
          version="1.0">
    <xsl:output indent="yes"/>
    <xsl:key name="nodes" match="SomeName/*" use="concat(name(),'&amp;',.)"/>
    <xsl:template match="SomeName">
        <xsl:copy>
            <xsl:copy-of select="*[count(.|key('nodes',concat(name(),'&amp;',.))[1])=1]"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Result:

<SomeName>
<NodeA>DataA</NodeA>
<NodeA>DataB</NodeA>
<AnotherNode>DataA</AnotherNode>
<AnotherNode>DataC</AnotherNode>
<SingleNode>DataA</SingleNode>
</SomeName>

With the "Identity Transform":

<xsl:stylesheet
          xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
          version="1.0">
    <xsl:output indent="yes"/>
    <xsl:key name="nodes" match="SomeName/*" use="concat(name(),'&amp;',.)"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="SomeName/*[count(.|key('nodes',concat(name(),'&amp;',.))[1])!=1]"/>
</xsl:stylesheet>

Edit: Added an example with "indentity transform" in case more work has to be done.

Note: Muenchian Method of grouping.

Alejandro
+1 for the precise solution. The only minor issue I see is the use of `count()` instead of `generate-id()`, which I think may be faster on most XSLT processors.
Dimitre Novatchev
@Dimitre: Thanks! I really don't know wich one is faster. I tend to think that Id generation plus string manipulation is slower. I should test it. But that's really not the reason. I just like Set Theory!
Alejandro
Fantastic. Thanks a lot!
Grinner