views:

326

answers:

2

How do I find the total number of duplicates in a string? i.e., if it was j= [1,1,1,2,2,2] it would find 4 duplicates? I've only been able to find counting which shows how many times each individual number occurred.

+14  A: 
>>> j= [1,1,1,2,2,2]
>>> len(j) - len(set(j))
4

and btw, j is a list and not a string, although for the purpose of this exercise it doesn't really matter.

SilentGhost
A: 

There seems to be a popular answer already, but if you would like to maintain the individual duplicate counts as well, the new Counter() collection object in Python 2.7 is perfect for this.

>>> from collections import Counter

>>> j = [1,1,1,2,2,2]

>>> Counter(j)
Counter({1: 3, 2: 3})

>>> sum([i - 1 for i in c.values() if i > 1])
4

>>> {k: v - 1 for k, v in c.items()} # individual dupes
{1: 2, 2: 2}

There is a backport for Counter at ActiveState

jonwd7
Some suggestions/observations: One needs to assume that `c = Counter(j)` gets executed in there somewhere. With your sum thingy, you require Python 2.7 therfore you can lose the `[]`. Secondly, the 'if' clause is redundant. Thirdly, there's no point building the list `c.values()`. Result: `sum(i - 1 for i in c.itervalues())` or after some algebra, try `sum(c.itervalues()) - len(c)`. Try adding a non-dupe e.g. 3 to the input. Check whether your dictionary of individual dupes is really what you intended. HTH.
John Machin
OK… Firstly, tell me how `i > 1` is "redundant"? I **need** to prevent `i < 0` in case negative integers were to ever make their way into the dict. So the comparison happens regardless. Now consider a list of 1M items, where 999,000 are `0` and then tell me I shouldn't go ahead and prevent `0` from appending to the list also. "After some algebra" sounds like you're telling me I need to learn it, but as I just said I'd like to make certain `-1000000` doesn't belong to the dict, in case something other than `Counter()` modifies it.
jonwd7
Secondly, my using `Counter()` doesn't necessitate 2.7 whatsoever as I've included a link to the backport. (I do use a dict comprehension but that can very quickly be changed). Therefore I may or not be able to "lose the `[]`"... But what does that *matter*? Thirdly, `itervalues`/`iteritems` was useless for this simple example and harder to type, so forgive me. And lastly, I don't really have any idea what you mean about a non-dupe 3... It would print out `..., 3: 0}` and that may or may not be exactly what the OP *et al* want to happen in that situation. `i > 0` then?
jonwd7
… and clearly `c = Counter(j)` happens. It's *implied* and I wasn't going to waste 2 extra lines on showing it. And to answer "Check whether your dictionary of individual dupes is really what you intended." directly: It was exactly as I intended.
jonwd7
Backport works back to 2.5; generator expressions (no []) introduced in 2.4; avoiding creating a large list just to sum its values matters. If your counts have gone negative, you should be chucking Exceptions like fireworks on 4 July, not covering up the problem. The i > 1 thing is as you say a problem if you have a large list; solution: DON'T USE A LIST!
John Machin