tags:

views:

48

answers:

2

I want to use XSLT to transform a set of documents into one structure. I have the transformation working correctly to concatenate the documents. I don't know, however, whether the the documents have duplicate entries in them, which I will need to remove.

I need to know how to remove these duplicates (if they exist) by an id attribute. All duplicates will have the same id. I know it will have something to do with keys and generate-id functions.

<root>
    <item id="1001">A</item>
    <item id="1003">C</item>
    <item id="1004">D</item>
    <item id="1002">B</item>
    <item id="1001">A</item>
    <item id="1003">C</item>
    <item id="1004">D</item>
    <item id="1005">E</item>
</root>

I need an XSLT 1.0 transformation for the above, based on the following...

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

Also, would someone be able to explain how it works to me too? Bit of a noob...

Thanks in advance...

+3  A: 

Commonly solutions are presented with the use of generate-id() but personally I prefer a slightly different variation that doesn't use generate-id:-

<xsl:key name="items" match="item" use="@id" />

<xsl:template match="root">
    <root>
        <xsl:copy-of select="item[count(key('items',@id)[1]|.)=1]" />
    </root>
</xsl:template>

First you create a key which holds the all item elements using the id attribute as the lookup key. key generates an efficient index which can be used to look up items.

The technique relies on the fact that when create a node-set using the | operator you get a unique set of nodes. In other words if the same node is found on both sides of the | operator it only appears in the resulting set once.

The expression:-

 key('items',@id)

Will return the set of item nodes that have a specific ID. So:-

 key('items',@id)[1]

will return only one of the nodes that were found have that specific ID and is repeatable (that is using this expression repeatedly always returns the same node).

Hence the expression:-

 count(key('items',@id)[1]|.)=1

is can only be true for one item node with a specific id value.

The copy-of therefore makes a deep copy of only one item node having a distinct id.

AnthonyWJones
@Anthony: My 0.02$ - while the `count()` approach takes less space, it is also harder to understand. Proof: The long explanation. :) The `generate-id()` approach is less opaque, that's why I would always recommend towards the latter. There *are* cases where the `count()` way is the only option, but they are rare and far apart. (edit: still, +1)
Tomalak
+3  A: 

Here is the generate-id() way @AnthonyWJones mentioned. I find this one much easier on the human mind. It makes no difference in the result, choose what you like best.

<xsl:stylesheet 
  version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
  <xsl:key name="kItemById" match="item" use="@id" />

  <xsl:template match="root">
    <copy>
      <xsl:copy-of select="
        item[generate-id() = generate-id(key('kItemById', @id)[1])]
      " />
    </copy>
  </xsl:template>
</xsl:stylesheet>

In short:

item[generate-id() = generate-id(key('kItemById', @id)[1])]

means: "All <item>s, whose unique ID is equal to the unique ID of first item with the same @id value".

Tomalak
+1 I agree the use of generate-id is easier to understand.
AnthonyWJones