ansaurus

Question

Python: Find identical items in multiple lists

Answer 1

A:

flatten, sort, 1 for loop comparing numbers before and after

joetsuihk 2010-01-22 09:42:35

This would fail if a number occurs twice in the same list.

Max Shawabkeh 2010-01-22 09:47:22

If you flatten first, you would get a "false positive" on input such as: [[1, 1], [2, 2]]

truppo 2010-01-22 09:48:54

Answer 2

+11 A:

The same way as you'd do it by hand:

seen = set()
repeated = set()
for l in list_of_lists:
  for i in set(l):
    if i in seen:
      repeated.add(i)
    else:
      seen.add(i)

By the way, here's the one liner (without counting the import) that some people were seeking (should be less efficient than the other approach)

from itertools import *
reduce(set.union, (starmap(set.intersection, combinations(map(set, ll), 2))))

fortran 2010-01-22 09:42:51

Great use of set!

Khelben 2010-01-22 09:45:40

Just be careful, as `l` needs also to be 'setted', or you'll get false positives in [[1,1],[2,2]]

Khelben 2010-01-22 09:58:06

+1 for readability

Till Backhaus 2010-01-22 09:58:09

as @Khelben says you should change `for i in l:` in `for i in set(l):`

Andrea Ambu 2010-01-22 10:09:19

With @Khelben's correction applied that solution worked well. Thanks!

mathias 2010-01-22 10:19:07

@Khelben thanks for pointing out, I was already thinking in "set mode" instead of lists ^_^

fortran 2010-01-22 11:36:04

Answer 3

A:

You can use a set see http://docs.python.org/library/stdtypes.html#set

luc 2010-01-22 09:43:23

Answer 4

+1 A:

You can use a dictionary to get the count of each

from collections import defaultdict

init_list = [[1,2,3], [3,4,5], [5,6,7], [7,8,9]]
#defaultdict, every new key will have a int(0) as default value
d = defaultdict(int)
for values in init_list:
  #Transform each list in a set to avoid false positives like [[1,1],[2,2]]
  for v in set(values):
    d[v] += 1

#Get only the ones that are more than once
final_list = [ value for value,number in d.items() if number > 1 ]

Khelben 2010-01-22 09:44:13

Looks like a good job for `collections.defaultdict`.

muhuk 2010-01-22 10:38:00

@muhuk. Yes! you're absolutely right! I've changed the code. The standard library of Python is really big and full of useful things...

Khelben 2010-01-22 10:57:41

Answer 5

A:

l=[[1,2,3], [3,4,5], [5,6,7], [7,8,9]]
d={}
for x in l:
    for y in x:
        if not d.has_key(y):
            d[y]=0
        d[y]+=1
[x for x,y in d.iteritems() if y>1]

Till Backhaus 2010-01-22 09:47:15

Answer 6

+2 A:

reference: http://docs.python.org/library/stdtypes.html#set

#!/usr/bin/python

ll = [[1,2,3], [3,4,5], [5,6,7], [7,8,9]]
ls = [set(l) for l in ll]

su = ls[0]  #union
ssd = ls[0] #symmetric_difference
for s in ls[1:]:
  su = su.union(s)
  ssd = ssd.symmetric_difference(s)

result = su.difference(ssd)
print list(result)

=>

[3, 5, 7]

revise and adopt FP,

ll = [[1,2,3], [3,4,5], [5,6,7], [7,8,9]]

u = reduce(set.union, map(set, ll))
sd = reduce(set.symmetric_difference, map(set, ll))
print u - sd

=>

[3, 5, 7]

Dyno Fu 2010-01-22 09:47:35

Answer 7

A:

Try this:

data = [[1,2,3], [3,4,5], [5,6,7], [7,8,9], [1,2,3]]

res = set()

for i in data:
    for j in data:
        if i is not j:
            res |= set(i) & set(j)

print res

liwp 2010-01-22 09:48:58

Answer 8

+4 A:

Cleanest way would probably be to use reduce:

def findCommon(L):
    def R(a, b, seen=set()):
        a.update(b & seen)
        seen.update(b)
        return a
    return reduce(R, map(set, L), set())

result = findCommon([[1,2,3], [3,4,5], [5,6,7], [7,8,9]])

Result is a set, but just do list(result) if you really need a list.

truppo 2010-01-22 09:59:24

+1 for functional programming.

Dyno Fu 2010-01-22 10:02:31

@Dyno the `seen` variable contains a "hidden" mutable state between calls... functional programming shouldn't rely on side effects (and it's a peril if they are not explicit, I bet that most of people wouldn't notice that seen is playing the role of a "static" variable instead of a regular argument)

fortran 2010-01-22 11:40:06

@fortran: The typical usage of functions such as R is one-time, since they are placed inside another function and not at module level. I'll update the answer to make that clearer.

truppo 2010-01-22 11:50:09

@truppo I know that, but even with that premises an incautious reader could be fooled... I think it would be cleaner to move `seen` outside the parameter list of `R` and make it a local variable of `findCommon`.

fortran 2010-01-22 12:56:19

Answer 9

+1 A:

>>> sets = [[1,2,3], [3,4,5], [5,6,7], [7,8,9]]
>>> seen = set()
>>> duplicates = set()
>>> 
>>> for subset in map(set, sets) :
...     duplicates |= (subset & seen)
...     seen |= subset
... 
>>> print(duplicates)
set([3, 5, 7])
>>>

I tried for a one-line answer with map/reduce, but can't quite get it yet.

sykora 2010-01-22 10:06:29

Answer 10

A:

Here is my go:

seen = set()
result = set()
for s in map(set, [[1,2,3], [3,4,5], [5,6,7], [7,8,9]]):
    result.update(s & seen)
    seen.update(s)
print result

This prints:

set([3, 5, 7])

Daren Thomas 2010-01-22 10:27:22

ansaurus

tags:

views:

answers:

Python: Find identical items in multiple lists

related questions