views:

264

answers:

4

In Mathematica I have a list:

x = {1,2,3,3,4,5,5,6}

How will I make a list with the duplicates? Like:

{3,5}

I have been looking at Lists as Sets, if there is something like Except[] for lists, so I could do:

unique = Union[x]
duplicates = MyExcept[x,unique]

(Of course, if the x would have more than two duplicates - say, {1,2,2,2,3,4,4}, there the output would be {2,2,4}, but additional Union[] would solve this.)

But there wasn't anything like that (if I did understand all the functions there well).

So, how to do that?

+5  A: 
Will Robertson
Eh, I don't understand it perfectly, but it works, so thanks! I'll dig in it some more :)
Martin Janiczek
dreeves
@dreeves: Personal preference only; I sometimes find the `[[` `]]` notation harder to read in this sort of context. I tend to only use it when explicitly grabbing element(s) from a variable, rather than as a step in a chain of functions that are manipulating data. Not sure if that makes sense `:)`
Will Robertson
(Continued after a thought break.) So, actually, in the broken down example I'd usually be happy to write `%[[All,1]]`. I suppose it depends on the linear-ness of the nesting of the functions. Only a gut-feeling kind of thing, I'd say, and one which changes over time.
Will Robertson
I'm with you, Will. In fact, I often use `f@x` instead of `f[x]` to avoid Lisp-like nested brackets since I find them hard to read. In other news, check out the solution I just added!
dreeves
A: 

Given a list A,
get the non-duplicate values in B
B = DeleteDuplicates[A]
get the duplicate values in C
C = Complement[A,B]
get the non-duplicate values from the duplicate list in D
D = DeleteDuplicates[C]

So for your example:
A = 1, 2, 2, 2, 3, 4, 4
B = 1, 2, 3, 4
C = 2, 2, 4
D = 2, 4

so your answer would be DeleteDuplicates[Complement[x,DeleteDuplicates[x]]] where x is your list. I don't know mathematica, so the syntax may or may not be perfect here. Just going by the docs on the page you linked to.

Brian Schroth
Complement[A,B] returns {}, not {2,2,4}. The problem is that it takes away all of the 2's and 4's, not just one of them.
Martin Janiczek
I was afraid that might be the case :(
Brian Schroth
+3  A: 

Here's a way to do it in a single pass through the list:

collectDups[l_] := Block[{i}, i[n_]:= (i[n] = n; Unevaluated@Sequence[]); i /@ l]

For example:

collectDups[{1, 1, 6, 1, 3, 4, 4, 5, 4, 4, 2, 2}] --> {1, 1, 4, 4, 4, 2}

If you want the list of unique duplicates -- {1, 4, 2} -- then wrap the above in DeleteDuplicates, which is another single pass through the list (Union is less efficient as it also sorts the result).

collectDups[l_] := 
  DeleteDuplicates@Block[{i}, i[n_]:= (i[n] = n; Unevaluated@Sequence[]); i /@ l]

Will Robertson's solution is probably better just because it's more straightforward, but I think if you wanted to eek out more speed, this should win. But if you cared about that, you wouldn't be programming in Mathematica! :)

dreeves
That doesn't work. You need to change the defintion for `i` to `i[n_]:=(i[n]=Unevaluated@Sequence[];n)`. Also, quick testing shows that it's a good deal slower than Will Robertson's solution; it's hard to beat built-in functions like `Tally`, which are written in C or C++ and can take advantage of things like array packing.
Pillsy
Why do you say it doesn't work? Pasting the function into Mathematica exactly as above yields the right output for that example. Do you have an example where it doesn't?
dreeves
The desired behavior is to return a list with a single copy of all duplicated elements, not every copy of all the duplicated elements. For the example you use, the returned value should be {1, 4, 2}.
Pillsy
Of course, my solution doesn't work right either. I really wish you could edit comments, but since you can't, I'll add it as an answer.
Pillsy
I just did some testing as well. I'm confident my solution works as written, but you're right that Will's solution is around twice as fast. I stand by the claim that this one makes a single pass whereas Will's makes multiple passes. But you're right, maybe array packing or whatnot makes up the difference.
dreeves
Looking at it some more, I'm convinced that your solution works as written, too, because you mention using `DeleteDuplicates`; if you edit your answer at all, I can change my vote from a downvote to an upvote. It's a good solution.
Pillsy
Note the second-to-last paragraph in my answer. The proposed tweak in your first comment changes "collectDups" to "deleteDups", which the builtin function DeleteDuplicates already does. Theoretically speaking, this answer still makes fewer passes than Will's since DeleteDuplicates is just one additional pass.
dreeves
Ah, thanks Pillsy. Sorry, our comments kept crossing in the ether! :)
dreeves
If the list was unsorted then my solution also needs `Sort`, which would close the performance difference (especially for large lists).
Will Robertson
+2  A: 

Using a solution like dreeves, but only returning a single instance of each duplicated element, is a bit on the tricky side. One way of doing it is as follows:

collectDups1[l_] :=
  Module[{i, j},
    i[n_] := (i[n] := j[n]; Unevaluated@Sequence[]);
    j[n_] := (j[n] = Unevaluated@Sequence[]; n);
    i /@ l];

This doesn't precisely match the output produced by Will Robertson's (IMO superior) solution, because elements will appear in the returned list in the order that it can be determined that they're duplicates. I'm not sure if it really can be done in a single pass, all the ways I can think of involve, in effect, at least two passes, although one might only be over the duplicated elements.

Pillsy