I use recursive merge sort for sorting a link list, but during the merge sort I would like to delete duplicates. Anyone has insight in how to accomplish this?
I am using C code.
I use recursive merge sort for sorting a link list, but during the merge sort I would like to delete duplicates. Anyone has insight in how to accomplish this?
I am using C code.
To use merge sort to remove duplicates, you would ignore elements that are repeated in the merging process.
In merge sort you take two (or more) already-sorted lists repeatedly apply the following rules:
To remove duplicates, you simply modify the rules very slightly:
This will ensure that no two consecutive items on your output list are the same, and that the items in it are in order, which is what you were after.
Or just use any sorting and when it completes, scan over the sorted list and remove duplicated elements (they naturally will be next to each other)
Consider the merge function within mergesort.
During the merge process, you are of course comparing elements with one another.
Convince yourself that, if you're merging 2 sorted lists A and B, and if both lists contain the same value x, then it will happen that the two identical elements will be compared with one another. If you want a proof, my approach would be to show that if there is a case where two identical elements are not compared, then one or both of the lists are in fact unsorted. Proof by contradiction, baby!
So you can easily detect cases whereby there are two identical elements in two lists being merged.
Next, convince yourself that if there are two identical elements in two lists not being merged just now, that eventually they will be merged together and the identical elements will be detected. That's basically a proof by induction --- if nothing else, clearly the very last merge (merging sorted lists of length n/2 and n/2 into the final list of length n) will bring those elements together.
Lastly, convince yourself that there cannot exist a singular list with two of the same element, if you recurse to the n = 1 or n = 0 case. This is again inductive-ish because of course any larger list will first have to survive the "filtering" process described in the first big paragraph.
If you are convinced of those three things, then it will be apparent that Steven's or Tim's solutions are quite suitable.
As others pointed out, you will need some modification to the merge
process to fit your need. Below is the modified merge()
function for your reference (original is here)
function merge(left,right)
var list result
while length(left) > 0 and length(right) > 0
if first(left) < first(right) // <--- change from <= to <
append first(left) to result
left = rest(left)
else if first(left) > first(right)
append first(right) to result
right = rest(right)
else // <----- added case to remove duplicated items
append first(right) to result
left = rest(left)
right = rest(right)
end
end while
if length(left) > 0
append left to result
else
append right to result
return result