I have two very large lists and to loop through it once takes at least a second and I need to do it 200,000 times. What's the fastest way to remove duplicates in two lists to form one?
This is the fastest way I can think of:
import itertools
output_list = list(set(itertools.chain(first_list, second_list)))
Slight update: As jcd points out, depending on your application, you probably don't need to convert the result back to a list. Since a set is iterable by itself, you might be able to just use it directly:
output_set = set(itertools.chain(first_list, second_list))
for item in output_set:
# do something
Beware though that any solution involving the use of set()
will probably reorder the elements in your list, so there's no guarantee that elements will be in any particular order. That said, since you're combining two lists, it's hard to come up with a good reason why you would need a particular ordering over them anyway, so this is probably not something you need to worry about.
result = list(set(list1).union(set(list2)))
That's how I'd do it. I am not so sure about performance, though, but it is certainly better, than doing it by hand.
As Daniel states, a set cannot contain duplicate entries - so concatenate the lists:
list1 + list2
Then convert the new list to a set:
set(list1 + list2)
Then back to a list:
list(set(list1 + list2))
I'd recommend something like this:
def combine_lists(list1, list2):
s = set(list1)
s.update(list2)
return list(s)
This eliminates the problem of creating a monster list of the concatenation of the first two.
Depending on what you're doing with the output, don't bother to convert back to a list. If ordering is important, you might need some sort of decorate/sort/undecorate shenanigans around this.