views:

520

answers:

5

I use recursive merge sort for sorting a link list, but during the merge sort I would like to delete duplicates. Anyone has insight in how to accomplish this?

I am using C code.

+5  A: 

To use merge sort to remove duplicates, you would ignore elements that are repeated in the merging process.

monksy
This reply answered the question recusively.
pierr
take a look at the instructions for Mergesort.
monksy
+6  A: 

In merge sort you take two (or more) already-sorted lists repeatedly apply the following rules:

  • find the lesser/least of the items of the top of each of the input lists, choosing any of the lowest items if there is a tie
  • remove that item from its list
  • add it to your output list

To remove duplicates, you simply modify the rules very slightly:

  • find the lesser/least of the items of the top of each of the input lists, choosing any of the lowest items if there is a tie
  • remove that item from its list
  • if it is the same as the last item you added to your output list, throw it away
  • otherwise, add it to your output list

This will ensure that no two consecutive items on your output list are the same, and that the items in it are in order, which is what you were after.

Tim
A: 

Or just use any sorting and when it completes, scan over the sorted list and remove duplicated elements (they naturally will be next to each other)

Dmitry
+1  A: 

Consider the merge function within mergesort.

During the merge process, you are of course comparing elements with one another.

Convince yourself that, if you're merging 2 sorted lists A and B, and if both lists contain the same value x, then it will happen that the two identical elements will be compared with one another. If you want a proof, my approach would be to show that if there is a case where two identical elements are not compared, then one or both of the lists are in fact unsorted. Proof by contradiction, baby!

So you can easily detect cases whereby there are two identical elements in two lists being merged.

Next, convince yourself that if there are two identical elements in two lists not being merged just now, that eventually they will be merged together and the identical elements will be detected. That's basically a proof by induction --- if nothing else, clearly the very last merge (merging sorted lists of length n/2 and n/2 into the final list of length n) will bring those elements together.

Lastly, convince yourself that there cannot exist a singular list with two of the same element, if you recurse to the n = 1 or n = 0 case. This is again inductive-ish because of course any larger list will first have to survive the "filtering" process described in the first big paragraph.

If you are convinced of those three things, then it will be apparent that Steven's or Tim's solutions are quite suitable.

Agor
A: 

As others pointed out, you will need some modification to the merge process to fit your need. Below is the modified merge() function for your reference (original is here)

function merge(left,right)
var list result
while length(left) > 0 and length(right) > 0
    if first(left) < first(right)    // <--- change from <= to <
        append first(left) to result
        left = rest(left)
    else if first(left) > first(right)
        append first(right) to result
        right = rest(right)
    else        // <----- added case to remove duplicated items
        append first(right) to result
        left = rest(left)
        right = rest(right)
    end

end while
if length(left) > 0 
    append left to result
else  
    append right to result
return result
pierr