ansaurus

Question

How can I remove duplicate nodes in XQuery?

Answer 1

A:

I solved my problem by implementing a recursive uniqueness search function, based solely on the text content of my document for uniqueness matching.

declare function ssd:unique-elements($list, $rules, $unique) {
    let $element := subsequence($rules, 1, 1)
    let $return :=
    if ($element) then
        if (index-of($list, $element) >= 1) then
            ssd:unique-elements(insert-before($element, 1, $list), subsequence($rules, 2), $unique)
        else <test>
            <unique>{$element}</unique>
            {ssd:unique-elements(insert-before($element, 1, $list), subsequence($rules, 2), insert-before($element, 1, $unique))/*}
            </test>
    else ()
    return $return
};

Called as follows:

declare function ssd:start2() {
    let $data := ()
    let $sift-this := 
       <test>
           <data>123</data>
           <data>456</data>
           <data>123</data>
           <data>456</data>
           <more-data>456</more-data>
       </test>
    return ssd:unique-elements($data, $sift-this/*, ())/*/*
};

ssd:start2()

output:

<?xml version="1.0" encoding="UTF-8"?>
<data>123</data>
<data>456</data>

I guess if you need slightly different equivalence matching, you can alter the matching in the algorithm accordingly. Should get you started at any rate.

Brabster 2009-03-13 22:50:38

Answer 2

+2 A:

A simpler and more direct one-liner XPath solution:

Just use the following XPath expression:

  /*/book
       [index-of(/*/book/title, 
                 title
                 )
                  [1]
        ]

When applied, for example, on the following XML document:

<books>
    <book>
     <title>XML in 24 hours</title>
     <author>Some Guy</author>
    </book>
    <book>
     <title>Food in Seattle</title>
     <author>Some Guy2</author>
    </book>
    <book>
     <title>XML in 24 hours</title>
     <author>Some Guy</author>
    </book>
    <book>
     <title>Food in Seattle</title>
     <author>Some Guy2</author>
    </book>
    <book>
     <title>How to solve XPAth Problems</title>
     <author>Me</author>
    </book>
</books>

the above XPath expression selects correctly the following nodes:

<book>
 <title>XML in 24 hours</title>
 <author>Some Guy</author>
</book>
<book>
 <title>Food in Seattle</title>
 <author>Some Guy2</author>
</book>
<book>
 <title>How to solve XPAth Problems</title>
 <author>Me</author>
</book>

The explanation is simple: For every book, select only one of its occurences -- such that its index in all-books is the same as the first index of its title in all-titles.

Dimitre Novatchev 2009-03-20 14:01:46

Hey Dimitre, thanks for the answer; but if I understand correctly, it depends on all the elements having the same structure which is the built into the query - for example it would show two nodes the same if they had the same title and different authors...

Brabster 2009-03-20 21:56:37

@Brabster It is not at all clear from your question how the test for inequality/uniqueness should be defined. If you define it, it will help you to find a simpler solution

Dimitre Novatchev 2009-03-21 14:58:58

Answer 3

+1 A:

You can use the built-in distinct-values() function...

Travis J Webb 2010-03-11 05:44:51

Answer 4

+1 A:

What about fn:distinct-values?

Matthias 2010-05-29 19:02:42

Answer 5

A:

A solution inspired by functional programming. This solution is extensible in that you can replace the "=" comparison by your custom-built boolean local:compare($element1, $element2) function. This function has worst-case quadratic complexity in the length of the list. You could get n(log n) complexity by sorting the list before-hand and only comparing with the immediate successor.

To my best knowledge, the fn:distinct-values (or fn:distinct-elements) functions does not allow to use a custom-built comparison function.

declare function local:deduplicate($list) {
  if (fn:empty($list)) then ()
  else 
    let $head := $list[1],
      $tail := $list[position() > 1]
    return
      if (fn:exists($tail[ . = $head ])) then local:deduplicate($tail)
      else ($head, local:deduplicate($tail))
};

let $list := (1,2,3,4,1,2,1) return local:deduplicate($list)

Benedikt 2010-07-01 09:31:42

ansaurus

tags:

views:

answers:

How can I remove duplicate nodes in XQuery?

related questions