views:

293

answers:

6

How do I prevent duplicate entries into a list, and then ideally, sort that list? What I'm doing, is when information at one level is missing, taking the information from a level below it, to building the missing list, in the level above. Currently, I have XML similar to this:

<c03 id="ref6488" level="file">
    <did>
        <unittitle>Clinic Building</unittitle>
        <unitdate era="ce" calendar="gregorian">1947</unitdate>
    </did>
    <c04 id="ref34582" level="file">
        <did>
            <container label="Box" type="Box">156</container>
            <container label="Folder" type="Folder">3</container>
        </did>
    </c04>
    <c04 id="ref6540" level="file">
        <did>
            <container label="Box" type="Box">156</container>
            <unittitle>Contact prints</unittitle>
        </did>
    </c04>
    <c04 id="ref6606" level="file">
        <did>
            <container label="Box" type="Box">154</container>
            <unittitle>Negatives</unittitle>
        </did>
    </c04>
</c03>

I then apply the following XSL:

<xsl:template match="c03/did">
    <xsl:choose>
        <xsl:when test="not(container)">
            <did>
                <!-- If no c03 container item is found, look in the c04 level for one -->
                <xsl:if test="../c04/did/container">

                    <!-- If a c04 container item is found, use the info to build a c03 version -->
                    <!-- Skip c03 container item, if still no c04 items found -->
                    <container label="Box" type="Box">

                        <!-- Build container list -->
                        <!-- Test for more than one item, and if so, list them, -->
                        <!-- separated by commas and a space -->
                        <xsl:for-each select="../c04/did">
                            <xsl:if test="position() &gt; 1">, </xsl:if>
                            <xsl:value-of select="container"/>
                        </xsl:for-each>
                    </container>                    
            </did>
        </xsl:when>

        <!-- If there is a c03 container item(s), list it normally -->
        <xsl:otherwise>
            <xsl:copy-of select="."/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

But I'm getting the "container" result of

<container label="Box" type="Box">156, 156, 154</container>

when what I want is

<container label="Box" type="Box">154, 156</container>

Below is the full result that I'm trying to get:

<c03 id="ref6488" level="file">
    <did>
        <container label="Box" type="Box">154, 156</container>
        <unittitle>Clinic Building</unittitle>
        <unitdate era="ce" calendar="gregorian">1947</unitdate>
    </did>
    <c04 id="ref34582" level="file">
        <did>
            <container label="Box" type="Box">156</container>
            <container label="Folder" type="Folder">3</container>
        </did>
    </c04>
    <c04 id="ref6540" level="file">
        <did>
            <container label="Box" type="Box">156</container>
            <unittitle>Contact prints</unittitle>
        </did>
    </c04>
    <c04 id="ref6606" level="file">
        <did>
            <container label="Box" type="Box">154</container>
            <unittitle>Negatives</unittitle>
        </did>
    </c04>
</c03>

Thanks in advance for any help!

+1  A: 

try using a Key group in xslt, here's an article on the Muenchian method which should help to eliminate duplicates. http://www.jenitennison.com/xslt/grouping/muenchian.html

derek
Jeni's article explains the Meunchian method very nicely for the hard of thinking (like me)!
Chris F
A: 

Try the following code:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
  <xsl:output indent="yes"></xsl:output>

<xsl:template match="node() | @*">
  <xsl:copy>
    <xsl:apply-templates select="node() | @*"/>
  </xsl:copy>
</xsl:template>

  <xsl:template match="c03/did">
    <xsl:choose>
      <xsl:when test="not(container)">
        <did>
          <!-- If no c03 container item is found, look in the c04 level for one -->
          <xsl:if test="../c04/did/container">
            <xsl:variable name="foo" select="../c04/did/container[@type='Box']/text()"/>
            <!-- If a c04 container item is found, use the info to build a c03 version -->
            <!-- Skip c03 container item, if still no c04 items found -->
            <container label="Box" type="Box">

              <!-- Build container list -->
              <!-- Test for more than one item, and if so, list them, -->
              <!-- separated by commas and a space -->
              <xsl:for-each select="distinct-values($foo)">
                <xsl:sort />
                <xsl:if test="position() &gt; 1">, </xsl:if>
                <xsl:value-of select="." />
              </xsl:for-each>
            </container>
            <xsl:apply-templates select="*" />
          </xsl:if>
        </did>
      </xsl:when>

      <!-- If there is a c03 container item(s), list it normally -->
      <xsl:otherwise>
        <xsl:copy-of select="."/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

It looks pretty much as the output you want:

<?xml version="1.0" encoding="UTF-8"?>
<c03 id="ref6488" level="file">
  <did>
      <container label="Box" type="Box">154, 156</container>
      <unittitle>Clinic Building</unittitle>
      <unitdate era="ce" calendar="gregorian">1947</unitdate>
   </did>
  <c04 id="ref34582" level="file">
      <did>
         <container label="Box" type="Box">156</container>
         <container label="Folder" type="Folder">3</container>
      </did>
  </c04>
  <c04 id="ref6540" level="file">
      <did>
         <container label="Box" type="Box">156</container>
         <unittitle>Contact prints</unittitle>
      </did>
  </c04>
  <c04 id="ref6606" level="file">
      <did>
         <container label="Box" type="Box">154</container>
         <unittitle>Negatives</unittitle>
      </did>
  </c04>
</c03>

The trick is to use <xsl:sort> and distinct-values() together. See the (IMHO) great book from Michael Key "XSLT 2.0 and XPATH 2.0"

Patrick
I'm using XSLT2, so I went with this solution, and it worked great. The only thing, was that I had to comment out the \<xsl:apply-templates select="\*" \/\> For some reason, it was duplicating the "unittitle" nodes. Thanks so much!
LOlliffe
I share your high opinion of Michael Kay's book. Unfortunately, too few people/organizations have switched to XSLT 2.0.
Dimitre Novatchev
A: 

The following XSLT 1.0 transformation does what you are looking for

<xsl:stylesheet 
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> 
  <xsl:output encoding="utf-8" />

  <!-- key to index containers by these three distinct qualities: 
       1: their ancestor <c??> node (represented as its unique ID)
       2: their @type attribute value
       3: their node value (i.e. their text) -->
  <xsl:key 
    name  = "kContainer" 
    match = "container"
    use   = "concat(generate-id(../../..), '|', @type, '|', .)"
  />

  <!-- identity template to copy everything as is by default -->
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*" />
    </xsl:copy>
  </xsl:template>

  <!-- special template for <did>s without a <container> child -->
  <xsl:template match="did[not(container)]">
    <xsl:copy>
      <xsl:copy-of select="@*" />
      <container label="Box" type="Box">
        <!-- from subordinate <container>s of type Box, use the ones
             that are *the first* to have that certain combination 
             of the three distinct qualities mentioned above -->
        <xsl:apply-templates mode="list-values" select="
          ../*/did/container[@type='Box'][
            generate-id()
            =
            generate-id(
              key(
                'kContainer', 
                concat(generate-id(../../..), '|', @type, '|', .)
              )[1]
            )
          ]
        ">
          <!-- sort them by their node value -->
          <xsl:sort select="." data-type="number" />
        </xsl:apply-templates>
      </container>
      <xsl:apply-templates select="node()" />
    </xsl:copy>
  </xsl:template>

  <!-- generic template to make list of values from any node-set -->
  <xsl:template match="*" mode="list-values">
    <xsl:value-of select="." />
    <xsl:if test="position() &lt; last()">
      <xsl:text>, </xsl:text>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

Returns

<c03 id="ref6488" level="file">
  <did>
    <container label="Box" type="Box">154, 156</container>
    <unittitle>Clinic Building</unittitle>
    <unitdate era="ce" calendar="gregorian">1947</unitdate>
  </did>
  <c04 id="ref34582" level="file">
    <did>
      <container label="Box" type="Box">156</container>
      <container label="Folder" type="Folder">3</container>
    </did>
  </c04>
  <c04 id="ref6540" level="file">
    <did>
      <container label="Box" type="Box">156</container>
      <unittitle>Contact prints</unittitle>
    </did>
  </c04>
  <c04 id="ref6606" level="file">
    <did>
      <container label="Box" type="Box">154</container>
      <unittitle>Negatives</unittitle>
    </did>
  </c04>
</c03>

The generate-id() = generate-id(key(...)[1]) part is what's called Muenchian grouping. Unless you can use XSLT 2.0, this is the way to go.

Tomalak
+1  A: 

There is no need for an XSLT 2.0 solution for this problem.

Here is an XSLT 1.0 solution, which is more compact than the currently selected XSLT 2.0 solution (35 lines vs. 43 lines):

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:key name="kBoxContainerByVal"
     match="container[@type='Box']" use="."/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="c03/did[not(container)]">
   <xsl:copy>

   <xsl:variable name="vContDistinctValues" select=
    "/*/*/*/container[@type='Box']
            [generate-id()
            =
             generate-id(key('kBoxContainerByVal', .)[1])
            ]
            "/>

    <container label="Box" type="Box">
      <xsl:for-each select="$vContDistinctValues">
        <xsl:sort data-type="number"/>

        <xsl:value-of select=
        "concat(., substring(', ', 1 + 2*(position() = last())))"/>
      </xsl:for-each>
    </container>
    <xsl:apply-templates/>
   </xsl:copy>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the originally provided XML document, the correct, wanted result is produced:

<c03 id="ref6488" level="file">
   <did>
      <container label="Box" type="Box">156, 154</container>
      <unittitle>Clinic Building</unittitle>
      <unitdate era="ce" calendar="gregorian">1947</unitdate>
   </did>
   <c04 id="ref34582" level="file">
      <did>
         <container label="Box" type="Box">156</container>
         <container label="Folder" type="Folder">3</container>
      </did>
   </c04>
   <c04 id="ref6540" level="file">
      <did>
         <container label="Box" type="Box">156</container>
         <unittitle>Contact prints</unittitle>
      </did>
   </c04>
   <c04 id="ref6606" level="file">
      <did>
         <container label="Box" type="Box">154</container>
         <unittitle>Negatives</unittitle>
      </did>
   </c04>
</c03>

Update:

I didn't notice the requirement that the container numbers must appear sorted. Now the solution reflects this.

Dimitre Novatchev
Your solution does not sort the list, which was requested in the question. Easily remedied by adding `<xsl:sort/>` inside the `xsl:for-each` loop.
markusk
@markusk: Thanks, I am usually sleepy early in the morning. `<xsl:sort/>` also needs `data-type="number"` in this case.
Dimitre Novatchev
+1  A: 

A slightly shorter XSLT 2.0 version, combining approaches from other answers. Note that sorting is alphabetical, so that if the labels "54" and "156" are found, the output will be "156, 54". If a numerical sort is needed, use <xsl:sort select="number(.)"/> instead of <xsl:sort/>.

<xsl:stylesheet version="2.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output omit-xml-declaration="yes" indent="yes"/> 
    <xsl:strip-space elements="*"/>

    <xsl:template match="node()|@*"> 
        <xsl:copy> 
            <xsl:apply-templates select="node()|@*"/> 
        </xsl:copy> 
    </xsl:template> 

    <xsl:template match="c03/did[not(container)]">
        <xsl:variable name="containers" 
                      select="../c04/did/container[@label='Box'][text()]"/>
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:if test="$containers">
                <container label="Box" type="Box">
                    <xsl:for-each select="distinct-values($containers)">
                        <xsl:sort/>
                        <xsl:if test="position() != 1">, </xsl:if>
                        <xsl:value-of select="."/>
                    </xsl:for-each>
                </container> 
            </xsl:if>
            <xsl:apply-templates select="node()"/> 
        </xsl:copy> 
    </xsl:template> 
</xsl:stylesheet>
markusk
+1. This is still the shortest XSLT 2.0 solution! :)
Dimitre Novatchev
+1  A: 

A truly XSLT 2.0 solution, also quite short:

<xsl:stylesheet  version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs"
>
  <xsl:output omit-xml-declaration="yes" indent="yes"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="c03/did[not(container)]">
    <xsl:copy>
      <xsl:copy-of select="@*"/>

      <xsl:variable name="vContDistinctValues" as="xs:integer*">
        <xsl:perform-sort select=
          "distinct-values(/*/*/*/container[@type='Box']/text()/xs:integer(.))">
          <xsl:sort/>
        </xsl:perform-sort>
      </xsl:variable>

      <xsl:if test="$vContDistinctValues">
        <container label="Box" type="Box">
          <xsl:value-of select="$vContDistinctValues" separator=","/>
        </container>
      </xsl:if>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Do note:

  1. The use of types avoids the need to specify the data-type in <xsl:sort/> .

  2. The use of the separator attribute of <xsl:value-of/>

Dimitre Novatchev
+1 Nicely done. However, do we know that the `c03` element will be the root? The poster only stated that the input would be "similar", so I feel slightly more comfortable with relative XPaths (i.e., `../c04/container`, or perhaps `../*/container`) instead of absolute ones (`/*/*/*/container`). That way, the stylesheet works even if the `c03` element appears further down in the document structure.
markusk
@markusk Nice comment again, and thanks for the upvote! Yes, we witness how OPs constantly change the definition of their problems. Sometimes I am tempted to act as a fortune teller, but this has a risk in itself, too. Also, the XSLT code becomes more and more not too-directly related to the actual XML and thus more difficult to understand. This is why, in cases like this I generally prefer to remain as close to the posted XML as possible. I've written many times most general XSLT code, e.g. the XPath Visualizer/FXSL, but the purpose here is to be as much specific/helpful, as possible. :)
Dimitre Novatchev