tags:

views:

184

answers:

5

I have two lists in Python, like these:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two']

I need to create a third list with items from the first list which aren't present in the second one. From the example I have to get:

temp3 = ['Three', 'Four']

Are there any fast ways without cycles and checking?

+2  A: 

Try this:

temp3 = set(temp1) - set(temp2)
Maciej Kucharz
+7  A: 
In [5]: list(set(temp1) - set(temp2))
Out[5]: ['Four', 'Three']
ars
Doesn't preserve order and eliminates duplicate items, but still the best solution, at least performance-wise. +1
delnan
@delnan: Actually I don't think it wins performance-wise either. See my answer for a performance comparison. It would be pretty good for codegolf though.
Mark Byers
+6  A: 
temp3 = [item for item in temp1 if item not in temp2]
matt b
Turning `temp2` into a set before would make this a bit more efficient.
lunaryorn
True, depends if Ockonal cares about duplicates or not (original question doesn't say)
matt b
Comment says the (lists|tuples) don't have duplicates.
delnan
+6  A: 

i'll toss in since none of the present solutions yield a tuple:

temp3 = tuple(set(temp1) - set(temp2))

alternatively:

#edited using @Mark Byers idea. If you accept this one as answer, just accept his instead.
temp3 = tuple(x for x in temp1 if x not in set(temp2))

Like the other non-tuple yielding answers in this direction, it preserves order

aaronasterling
Good catch, worth +1
delnan
+1: Based on the OPs mistakes in the question and accepted answer it looks likely that he really meant 'list' when he wrote 'tuple' and so have edited the question accordingly.
Mark Byers
+11  A: 

The existing solutions all offer either one or the other of:

  • Faster than O(n*m) performance.
  • Preserve order of input list.

But so far no solution has both. If you want both, try this:

s = set(temp2)
temp3 = [x for x in temp1 if x not in s]

Performance test

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print timeit.timeit('list(set(temp1) - set(temp2))', init, number = 100000)
print timeit.timeit('s = set(temp2);[x for x in temp1 if x not in s]', init, number = 100000)
print timeit.timeit('[item for item in temp1 if item not in temp2]', init, number = 100000)

Results:

4.34620224079 # ars' answer
4.2770634955  # This answer
30.7715615392 # matt b's answer

The method I presented as well as preserving order is also (slightly) faster than the set subtraction because it doesn't require construction of an unnecessary set. The performance difference would be more noticable if the first list is considerably longer than the second and if hashing is expensive. Here's a second test demonstrating this:

init = '''
temp1 = [str(i) for i in range(100000)]
temp2 = [str(i * 2) for i in range(50)]
'''

Results:

11.3836875916 # ars' answer
3.63890368748 # this answer (3 times faster!)
37.7445402279 # matt b's answer
Mark Byers
+1 for benchmarks (proving my assumption wrong, rightfully so) and having the fastest answer :)
delnan
@delnan: Thanks for the vote! :) :)
Mark Byers