ansaurus

Question

In python, how to check if there are any duplicates in list

Answer 1

+4 A:

Recommended for short lists only:

any(thelist.count(x) > 1 for x in thelist)

Do not use on a long list -- it can take time proportional to the square of the number of items in the list!

For longer lists with hashable items (strings, numbers, &c):

def anydup(thelist):
  seen = set()
  for x in thelist:
    if x in seen: return True
    seen.add(x)
  return False

If your items are not hashable (sublists, dicts, etc) it gets hairier, though it may still be possible to get O(N logN) if they're at least comparable. But you need to know or test the characteristics of the items (hashable or not, comparable or not) to get the best performance you can -- O(N) for hashables, O(N log N) for non-hashable comparables, otherwise it's down to O(N squared) and there's nothing one can do about it:-(.

Alex Martelli 2009-10-09 04:36:37

Denis Otkidach offered a solution where you just build a new set from the list, then check its length. Its advantage is that it's letting the C code inside Python do the heavy lifting. Your solution loops in Python code, but has the advantage of short-circuiting when a single match has been found. If the odds are that the list probably has no duplicates, I like Denis Otkidach's version, but if the odds are that there might well be a duplicate early on in the list, this solution is better.

steveha 2009-10-09 05:26:17

Worth an up for the detail, even though I think Denis had the neater solution.

Steve314 2009-10-09 05:30:53

@steveha - premature optimisation?

Steve314 2009-10-09 05:32:15

@Steve314, what premature optimization? I would have written it the way Denis Otkidach wrote it, so I was trying to understand why Alex Martelli (of Python Cookbook fame) wrote it differently. After I thought about it a bit I realized that Alex's version short-circuits, and I posted a few thoughts on the differences. How do you go from a discussion of differences to premature optimization, the root of all evil?

steveha 2009-10-09 16:47:19

Answer 2

+16 A:

Use set() to remove duplicates if all values are hashable:

>>> your_list = ['one', 'two', 'one']
>>> len(your_list)!=len(set(your_list))
True

Denis Otkidach 2009-10-09 04:38:45

ansaurus

tags:

views:

answers:

In python, how to check if there are any duplicates in list

related questions