ansaurus

Question

Answer 1

+5 A:

Will Robertson 2009-10-27 14:39:36

Eh, I don't understand it perfectly, but it works, so thanks! I'll dig in it some more :)

Martin Janiczek 2009-10-27 14:48:51

dreeves 2009-10-27 15:07:03

@dreeves: Personal preference only; I sometimes find the `[[` `]]` notation harder to read in this sort of context. I tend to only use it when explicitly grabbing element(s) from a variable, rather than as a step in a chain of functions that are manipulating data. Not sure if that makes sense `:)`

Will Robertson 2009-10-27 15:25:24

(Continued after a thought break.) So, actually, in the broken down example I'd usually be happy to write `%[[All,1]]`. I suppose it depends on the linear-ness of the nesting of the functions. Only a gut-feeling kind of thing, I'd say, and one which changes over time.

Will Robertson 2009-10-27 15:27:34

I'm with you, Will. In fact, I often use `f@x` instead of `f[x]` to avoid Lisp-like nested brackets since I find them hard to read. In other news, check out the solution I just added!

dreeves 2009-10-27 17:22:15

Answer 2

A:

Given a list A,
get the non-duplicate values in B
B = DeleteDuplicates[A]
get the duplicate values in C
C = Complement[A,B]
get the non-duplicate values from the duplicate list in D
D = DeleteDuplicates[C]

So for your example:
A = 1, 2, 2, 2, 3, 4, 4
B = 1, 2, 3, 4
C = 2, 2, 4
D = 2, 4

so your answer would be DeleteDuplicates[Complement[x,DeleteDuplicates[x]]] where x is your list. I don't know mathematica, so the syntax may or may not be perfect here. Just going by the docs on the page you linked to.

Brian Schroth 2009-10-27 14:39:54

Complement[A,B] returns {}, not {2,2,4}. The problem is that it takes away all of the 2's and 4's, not just one of them.

Martin Janiczek 2009-10-27 14:46:20

I was afraid that might be the case :(

Brian Schroth 2009-10-27 16:02:06

Answer 3

+3 A:

Here's a way to do it in a single pass through the list:

collectDups[l_] := Block[{i}, i[n_]:= (i[n] = n; Unevaluated@Sequence[]); i /@ l]

For example:

collectDups[{1, 1, 6, 1, 3, 4, 4, 5, 4, 4, 2, 2}] --> {1, 1, 4, 4, 4, 2}

If you want the list of unique duplicates -- {1, 4, 2} -- then wrap the above in DeleteDuplicates, which is another single pass through the list (Union is less efficient as it also sorts the result).

collectDups[l_] := 
  DeleteDuplicates@Block[{i}, i[n_]:= (i[n] = n; Unevaluated@Sequence[]); i /@ l]

Will Robertson's solution is probably better just because it's more straightforward, but I think if you wanted to eek out more speed, this should win. But if you cared about that, you wouldn't be programming in Mathematica! :)

dreeves 2009-10-27 17:17:26

That doesn't work. You need to change the defintion for `i` to `i[n_]:=(i[n]=Unevaluated@Sequence[];n)`. Also, quick testing shows that it's a good deal slower than Will Robertson's solution; it's hard to beat built-in functions like `Tally`, which are written in C or C++ and can take advantage of things like array packing.

Pillsy 2009-10-27 18:22:13

Why do you say it doesn't work? Pasting the function into Mathematica exactly as above yields the right output for that example. Do you have an example where it doesn't?

dreeves 2009-10-27 19:33:27

The desired behavior is to return a list with a single copy of all duplicated elements, not every copy of all the duplicated elements. For the example you use, the returned value should be {1, 4, 2}.

Pillsy 2009-10-27 19:43:00

Of course, my solution doesn't work right either. I really wish you could edit comments, but since you can't, I'll add it as an answer.

Pillsy 2009-10-27 19:47:22

I just did some testing as well. I'm confident my solution works as written, but you're right that Will's solution is around twice as fast. I stand by the claim that this one makes a single pass whereas Will's makes multiple passes. But you're right, maybe array packing or whatnot makes up the difference.

dreeves 2009-10-27 19:59:26

Looking at it some more, I'm convinced that your solution works as written, too, because you mention using `DeleteDuplicates`; if you edit your answer at all, I can change my vote from a downvote to an upvote. It's a good solution.

Pillsy 2009-10-27 20:05:31

Note the second-to-last paragraph in my answer. The proposed tweak in your first comment changes "collectDups" to "deleteDups", which the builtin function DeleteDuplicates already does. Theoretically speaking, this answer still makes fewer passes than Will's since DeleteDuplicates is just one additional pass.

dreeves 2009-10-27 20:07:00

Ah, thanks Pillsy. Sorry, our comments kept crossing in the ether! :)

dreeves 2009-10-27 20:07:36

If the list was unsorted then my solution also needs `Sort`, which would close the performance difference (especially for large lists).

Will Robertson 2009-10-27 22:01:15

Answer 4

+2 A:

Using a solution like dreeves, but only returning a single instance of each duplicated element, is a bit on the tricky side. One way of doing it is as follows:

collectDups1[l_] :=
  Module[{i, j},
    i[n_] := (i[n] := j[n]; Unevaluated@Sequence[]);
    j[n_] := (j[n] = Unevaluated@Sequence[]; n);
    i /@ l];

This doesn't precisely match the output produced by Will Robertson's (IMO superior) solution, because elements will appear in the returned list in the order that it can be determined that they're duplicates. I'm not sure if it really can be done in a single pass, all the ways I can think of involve, in effect, at least two passes, although one might only be over the duplicated elements.

Pillsy 2009-10-27 20:01:46

ansaurus

tags:

views:

answers:

Show duplicates in Mathematica

related questions