views:

189

answers:

5

I have two nested lists, each nested list containing two strings e.g.:

list 1 [('EFG', '[3,4,5]'), ('DEF', '[2,3,4]')] and list 2 [('DEF', '[2,3,4]'), ('FGH', '[4,5,6]')]

I would like to compare the two lists and recover those nested lists which are identical with each other. In this case only ('DEF','[2,3,4]') would be returned. The lists could get long. Is there an efficient way to do this?

+7  A: 

If the lists only contain string tuples, then the easiest way to do it is using sets intersection (&):

>>> set([('EFG', '[3,4,5]'), ('DEF', '[2,3,4]')]) & set([('DEF', '[2,3,4]'), ('FGH', '[4,5,6]')])
set([('DEF', '[2,3,4]')])
Nadia Alramli
A: 

Here's one approach. Convert your lists to sets and find their intersections.

>>> a,b = set([('EFG', '[3,4,5]'), ('DEF', '[2,3,4]')]), set([('DEF', '[2,3,4]'), ('FGH', '[4,5,6]')])
>>> print a.intersection(b)
set([('DEF', '[2,3,4]')])

I don't think it will degrade much with the length of your lists.

Noufal Ibrahim
A: 

Just use set the builtin support for sets

def get_set(list1,list2):
    s = set(list1)
    s &= set(list2)
    return s

And Boom there's your union

RHicke
union? Don't you mean *intersection*?
Noufal Ibrahim
A: 
intersection = [item for item in list1 if item in list2]

But only use it, if for some reason you have mutables in your lists and can’t convert to a set. Of course, it’s all depending on the size of your lists and the number of duplicates, but it could well be a factor of 10 times slower.

Debilski
This is usually going to be more expensive than set-based solutions.
Brian
Yes. Seeing it myself now. It’s a factor of 10 in my tests.
Debilski
However, this will work if the types contained in the list are not hashable, whereas all the set-based solutions will fail.
Ignacio Vazquez-Abrams
+3  A: 

Using sets to do this is a very good implementation, but for optimal performance you should use only one set:

set1 = set([('EFG', '[3,4,5]'), ('DEF', '[2,3,4]')])
matches = [x for x in [('DEF', '[2,3,4]'), ('FGH', '[4,5,6]')] if x in set1]
Ignacio Vazquez-Abrams
Could you explain why this is better?
Jeff
Generating a set from a sequence requires iterating through it, so generating both sets requires iterating through both sequences. Additionally, performing an intersection between two sets requires iterating through one of them and performing a membership operation on the other. Therefore, turning them both into sequences and intersecting requires iterating through one sequence once, and the other sequence twice. The solution I provided converts only one of the lists, then iterates through the other directly, saving one iteration total.
Ignacio Vazquez-Abrams
If the list -> set conversions and set intersection happen in c++ rather than python, it may be faster.
tgray