For example:
>>> x = [1, 1, 2, 'a', 'a', 3]
>>> unique(x)
[1, 2, 'a', 3]
Assume list elements are hashable.
Clarification: The result should keep the first duplicate in the list. For example, [1, 2, 3, 2, 3, 1] becomes [1, 2, 3].
For example:
>>> x = [1, 1, 2, 'a', 'a', 3]
>>> unique(x)
[1, 2, 'a', 3]
Assume list elements are hashable.
Clarification: The result should keep the first duplicate in the list. For example, [1, 2, 3, 2, 3, 1] becomes [1, 2, 3].
I have no experience with python, but an algorithm would be to sort the list, then remove duplicates (by comparing to previous items in the list), and finally find the position in the new list by comparing with the old list.
Longer answer: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560
I haven't done any tests, but one possible algorithm might be to create a second list, and iterate through the first list. If an item is not in the second list, add it to the second list.
x = [1, 1, 2, 'a', 'a', 3]
y = []
for each in x:
if each not in y:
y.append(each)
Taken from http://www.peterbe.com/plog/uniqifiers-benchmark
def f5(seq, idfun=None):
# order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
# in old Python versions:
# if seen.has_key(marker)
# but in new ones:
if marker in seen: continue
seen[marker] = 1
result.append(item)
return result
>>> def unique(list):
... y = []
... for x in list:
... if x not in y:
... y.append(x)
... return y
What's going to be fastest depends on what percentage of your list is duplicates. If it's nearly all duplicates, with few unique items, creating a new list will probably be faster. If it's mostly unique items, removing them from the original list (or a copy) will be faster.
Here's one for modifying the list in place:
def unique(items):
seen = set()
for i in xrange(len(items)-1, -1, -1):
it = items[i]
if it in seen:
del items[i]
else:
seen.add(it)
Iterating backwards over the indices ensures that removing items doesn't affect the iteration.
O(n) if dict is hash, O(nlogn) if dict is tree, and simple, fixed. Thanks to Matthew for the suggestion. Sorry I don't know the underlying types.
def unique(x):
output = []
y = {}
for item in x:
y[item] = ""
for item in x:
if item in y:
output.append(item)
return output
def unique(items):
found = set([])
keep = []
for item in items:
if item not in found:
found.add(item)
keep.append(item)
return keep
print unique([1, 1, 2, 'a', 'a', 3])
You can actually do something really cool in Python to solve this. You can create a list comprehension that would reference itself as it is being built. As follows:
# remove duplicates...
def unique(my_list):
return [x for x in my_list if x not in locals()['_[1]'].__self__]
Edit: I removed the "self", and it works on Mac OS X, Python 2.5.1.
The _[1] is Python's "secret" reference to the new list. The above, of course, is a little messy, but you could adapt it fit your needs as necessary. For example, you can actually write a function that returns a reference to the comprehension; it would look more like:
return [x for x in my_list if x not in this_list()]
>>> x=[1,1,2,'a','a',3]
>>> y = [ _x for _x in x if not _x in locals()['_[1]'] ]
>>> y
[1, 2, 'a', 3]
"locals()['_[1]']" is the "secret name" of the list being created.
One-liner:
new_list = reduce(lambda x,y: x+[y][:1-int(y in x)], my_list, [])
Do the duplicates necessarily need to be in the list in the first place? There's no overhead as far as looking the elements up, but there is a little bit more overhead in adding elements (though the overhead should be O(1) ).
>>> x = []
>>> y = set()
>>> def add_to_x(val):
... if val not in y:
... x.append(val)
... y.add(val)
... print x
... print y
...
>>> add_to_x(1)
[1]
set([1])
>>> add_to_x(1)
[1]
set([1])
>>> add_to_x(1)
[1]
set([1])
>>>
has_key in python is O(1). Insertion and retrieval from a hash is also O(1). Loops through n items twice, so O(n).
def unique(list):
s = {}
output = []
for x in list:
count = 1
if(s.has_key(x)):
count = s[x] + 1
s[x] = count
for x in list:
count = s[x]
if(count > 0):
s[x] = 0
output.append(x)
return output
One pass.
a = [1,1,'a','b','c','c']
new_list = []
prev = None
while 1:
try:
i = a.pop(0)
if i != prev:
new_list.append(i)
prev = i
except IndexError:
break
Using:
lst = [8, 8, 9, 9, 7, 15, 15, 2, 20, 13, 2, 24, 6, 11, 7, 12, 4, 10, 18, 13, 23, 11, 3, 11, 12, 10, 4, 5, 4, 22, 6, 3, 19, 14, 21, 11, 1, 5, 14, 8, 0, 1, 16, 5, 10, 13, 17, 1, 16, 17, 12, 6, 10, 0, 3, 9, 9, 3, 7, 7, 6, 6, 7, 5, 14, 18, 12, 19, 2, 8, 9, 0, 8, 4, 5]
And using the timeit module:
$ python -m timeit -s 'import uniquetest' 'uniquetest.etchasketch(uniquetest.lst)'
(and so on for the various other functions -- which I named after their posters), I have the following results (on my first generation Intel MacBook Pro):
[1] Note that Allen modifies the list in place – I believe this has skewed the time, in that the timeit module runs the code 100000 times and 99999 of them are with the dupe-less list.
Summary: Straight-forward implementation with sets wins over confusing one-liners :-)
I don't know if this one is fast or not, but at least it is simple.
Simply, convert it first to a set and then again to a list
def unique(container):
return list(set(container))
This is the fastest in-place method I've found (assuming a large proportion of duplicates):
def unique(l):
s = set(); n = 0
for x in l:
if x not in s: s.add(x); l[n] = x; n += 1
del l[n:]
This is 10% faster than Allen's implementation, on which it is based (timed with timeit.repeat, JIT compiled by psyco). It keeps the first instance of any duplicate.
repton-infinity: I'd be interested if you could confirm my timings.
Benchmark and a clear anwser at :
http://www.peterbe.com/plog/uniqifiers-benchmark
There are some great, efficient solutions here. However, for anyone not concerned with the absolute most efficient O(n)
solution, I'd go with the simple one-liner O(n^2*log(n))
solution:
def unique(xs):
return sorted(set(xs), key=lambda x: xs.index(x))
or the more efficient two-liner O(n*log(n))
solution:
def unique(xs):
positions = dict((e,pos) for pos,e in reversed(list(enumerate(xs))))
return sorted(set(xs), key=lambda x: positions[x])
If you take out the empty list from the call to set() in Terhost's answer, you get a little speed boost.
Change:
found = set([])
to:
found = set()
However, you don't need the set at all.
def unique(items):
keep = []
for item in items:
if item not in keep:
keep.append(item)
return keep
Using timeit I got these results:
with set([]) -- 4.97210427363
with set() -- 4.65712377445
with no set -- 3.44865284975
Obligatory generator-based variation:
def unique(seq):
seen = set()
for x in seq:
if x not in seen:
seen.add(x)
yield x
a=[1,2,3,4,5,7,7,8,8,9,9,3,45]
def unique(l):
ids={}
for item in l:
if not ids.has_key(item):
ids[item]=item
return ids.keys()
print a
print unique(a)
Inserting elements will take theta(n) retrieving if element is exiting or not will take constant time testing all the items will take also theta(n) so we can see that this solution will take theta(n) Bear in Mind that dictionary in python implemented by hash table
Here is the fastest solution so far (for the following input):
def del_dups(seq):
seen = {}
pos = 0
for item in seq:
if item not in seen:
seen[item] = True
seq[pos] = item
pos += 1
del seq[pos:]
lst = [8, 8, 9, 9, 7, 15, 15, 2, 20, 13, 2, 24, 6, 11, 7, 12, 4, 10, 18,
13, 23, 11, 3, 11, 12, 10, 4, 5, 4, 22, 6, 3, 19, 14, 21, 11, 1,
5, 14, 8, 0, 1, 16, 5, 10, 13, 17, 1, 16, 17, 12, 6, 10, 0, 3, 9,
9, 3, 7, 7, 6, 6, 7, 5, 14, 18, 12, 19, 2, 8, 9, 0, 8, 4, 5]
del_dups(lst)
print(lst)
# -> [8, 9, 7, 15, 2, 20, 13, 24, 6, 11, 12, 4, 10, 18, 23, 3, 5, 22, 19, 14,
# 21, 1, 0, 16, 17]
Dictionary lookup is slightly faster then the set's one in Python 3.
An in-place one-liner for this:
>>> x = [1, 1, 2, 'a', 'a', 3]
>>> [ item for pos,item in enumerate(x) if x.index(item)==pos ]
[1, 2, 'a', 3]