ansaurus

Question

Fastest Way To Remove Duplicates In Lists Python

Answer 1

+9 A:

This is the fastest way I can think of:

import itertools
output_list = list(set(itertools.chain(first_list, second_list)))

Slight update: As jcd points out, depending on your application, you probably don't need to convert the result back to a list. Since a set is iterable by itself, you might be able to just use it directly:

output_set = set(itertools.chain(first_list, second_list))
for item in output_set:
    # do something

Beware though that any solution involving the use of set() will probably reorder the elements in your list, so there's no guarantee that elements will be in any particular order. That said, since you're combining two lists, it's hard to come up with a good reason why you would need a particular ordering over them anyway, so this is probably not something you need to worry about.

Daniel Pryden 2009-11-04 17:19:47

Oh, your solution is better than mine :)

shylent 2009-11-04 17:22:27

Thanks for everyone's answers, they've all helped a lot! :)

Cookies 2009-11-04 17:38:15

+1. If order *is* important, then perhaps an ordered set will do: http://stackoverflow.com/questions/1653970/does-python-have-an-ordered-set

Stephan202 2009-11-04 17:38:57

@Stephan202: Good point. However, I would think switching to an ordered set (with any implementation, but especially a pure Python one) would be significantly slower than an unordered (hashtable-based) set. Depending on the size of the list and how well the hash function performs on the items, it might be faster to just use a normal `set` and then sort the items afterward.

Daniel Pryden 2009-11-04 18:01:19

Answer 2

+2 A:

result = list(set(list1).union(set(list2)))

That's how I'd do it. I am not so sure about performance, though, but it is certainly better, than doing it by hand.

shylent 2009-11-04 17:21:21

`set.union(self, other)` is fine with any iterable as `other`

kaizer.se 2009-11-04 18:17:52

Answer 3

+5 A:

As Daniel states, a set cannot contain duplicate entries - so concatenate the lists:

list1 + list2

Then convert the new list to a set:

set(list1 + list2)

Then back to a list:

list(set(list1 + list2))

Rob Golding 2009-11-04 17:22:34

Thanks for explaining what my code was doing. Beat me to it! :-) I'd just mention that the reason I edited my answer to use `itertools.chain()` instead of just concatenating the lists is because it avoids having to allocate a third large list in memory. The `set()` constructor doesn't actually need a list, it just needs an iterable that can iterate over all the elements, and `itertools.chain()` does that more efficiently (by avoiding copying).

Daniel Pryden 2009-11-04 17:27:04

Answer 4

+7 A:

I'd recommend something like this:

def combine_lists(list1, list2):
    s = set(list1)
    s.update(list2)
    return list(s)

This eliminates the problem of creating a monster list of the concatenation of the first two.

Depending on what you're doing with the output, don't bother to convert back to a list. If ordering is important, you might need some sort of decorate/sort/undecorate shenanigans around this.

jcdyer 2009-11-04 17:25:52

Agreed, there's no need to concatenate the two lists -- that just wastes memory. I'd be interested to see the performance difference between calling `s.update(list2)` versus the iterator approach I used above. Your approach may be slightly faster. However, as you point out, you get a much bigger performance savings by simply not converting back to a list at the end.

Daniel Pryden 2009-11-04 17:34:18

I ran a few timeits, and it seems to vary, which is faster, but never by more than 5% or 10% one way or the other. I'd call it a draw.

jcdyer 2009-11-04 18:14:56

Given that itertools is only chaining two objects, I'd say its impact is pretty minimal, so the question is whether there's a significant difference between set()ing one big list, or set()ing half that list and .update()ing the rest of it. Looks like there isn't.

jcdyer 2009-11-04 18:17:46

ansaurus

tags:

views:

answers:

Fastest Way To Remove Duplicates In Lists Python

related questions