ansaurus

Question

check if all elements in a list are identical

Answer 1

+1 A:

Doubt this is the "most Pythonic", but something like:

>>> falseList = [1,2,3,4]
>>> trueList = [1, 1, 1]
>>> 
>>> def testList(list):
...   for item in list[1:]:
...     if item != list[0]:
...       return False
...   return True
... 
>>> testList(falseList)
False
>>> testList(trueList)
True

would do the trick.

machineghost 2010-10-02 07:39:45

Answer 2

+10 A:

General method:

   def checkEqual1(iterator):
      try:
         iterator = iter(iterator)
         first = next(iterator)
         return all(first == rest for rest in iterator)
      except StopIteration:
         return True

One-liner:

    def checkEqual2(iterator):
       return len(set(iterator)) <= 1

Also one-liner:

    def checkEqual3(lst):
       return lst[1:] == lst[:-1]

The difference between the 3 versions are that:

In checkEqual2 the content must be hashable.
checkEqual1 and checkEqual2 can use any iterators, but checkEqual3 must take a sequence input, typically concrete containers like a list or tuple.
checkEqual1 stops as soon as a difference is found.
Since checkEqual1 contains more Python code, it is less efficient when many of the items are equal in the beginning.
Since checkEqual2 and checkEqual3 always perform O(N) copying operations, they will take longer if most of your input will return False.
checkEqual2 and checkEqual3 can't be easily changed to adopt to compare a is b instead of a == b.

timeit result, for Python 2.7 and (only s1, s4, s7, s9 should return True)

s1 = [1] * 5000
s2 = [1] * 4999 + [2]
s3 = [2] + [1]*4999
s4 = [set([9])] * 5000
s5 = [set([9])] * 4999 + [set([10])]
s6 = [set([10])] + [set([9])] * 4999
s7 = [1,1]
s8 = [1,2]
s9 = []

we get

     checkEqual1  checkEqual2   checkEqual3 checkEqualIvo checkEqual6502

s1 1.19     msec  348    usec  183     usec   51.6   usec   121     usec
s2 1.17     msec  376    usec  185     usec   50.9   usec   118     usec
s3     4.17 usec  348    usec  120     usec  264     usec    61.3   usec

s4 1.73     msec               182     usec   50.5   usec   121     usec
s5 1.71     msec               181     usec   50.6   usec   125     usec
s6     4.29 usec               122     usec  423     usec    61.1   usec

s7     3.1  usec    1.4  usec    1.24  usec    0.932 usec     1.92  usec
s8     4.07 usec    1.54 usec    1.28  usec    0.997 usec     1.79  usec
s9     5.91 usec    1.25 usec    0.749 usec    0.407 usec     0.386 usec

Note:

# http://stackoverflow.com/q/3844948/
def checkEqualIvo(lst):
    return not lst or lst.count(lst[0]) == len(lst)

# http://stackoverflow.com/q/3844931/
def checkEqual6502(lst):
    return not lst or [lst[0]]*len(lst) == lst

KennyTM 2010-10-02 07:43:37

+1 for the propably optimal solution

delnan 2010-10-02 08:03:25

+1 for use of set to solve this the right way.

Gabriel 2010-10-02 08:15:18

Thank you, this is a really helpful explanation of alternatives. Can you please double check your performance table - is it all in msec, and are the numbers in the correct cells?

max 2010-10-02 08:26:11

@max: Yes. Note that 1 msec = 1000 usec.

KennyTM 2010-10-02 08:28:32

Don't forget memory usage analysis for very large arrays, a native solution which optimizes away calls to `obj.__eq__` when `lhs is rhs`, and out-of-order optimizations to allow short circuiting sorted lists more quickly.

Glenn Maynard 2010-10-02 08:31:12

Ivo van der Wijk has a better solution for sequences that's about 5 times faster than set and O(1) in memory.

aaronasterling 2010-10-02 08:32:18

@AaronMcSmooth: Without being a criticism of this answer, it's telling that this answer will probably remain at four times the score of the other: not due to comparative value, but due to the common early-answer and popular-answer vote bias of this site.

Glenn Maynard 2010-10-02 08:57:03

@Glenn Maynard: I agree with you (shock, horror!). Some upvoters are positively Gadarene :-)

John Machin 2010-10-02 09:40:47

This is the only really correct (CS-wise) since it does not always traverse the entire list.

nimrodm 2010-10-02 11:38:56

Answer 3

+5 A:

You can convert the list to a set. A set cannot have duplicates. So if all the elements in the original list are identical, the set will have just one element.

if len(sets.Set(input_list)) == 1
// input_list has all identical elements.

codaddict 2010-10-02 07:43:41

this is nice but it doesn't short circuit and you have to calculate the length of the resulting list.

aaronasterling 2010-10-02 07:44:20

why not just `len(set(input_list)) == 1`?

Nick D 2010-10-02 07:50:06

@Nick: Thanks for pointing.

codaddict 2010-10-02 07:53:46

@AaronMcSmooth: Still a noob in py. Don't even know what a short circut in py means :)

codaddict 2010-10-02 07:55:05

@codaddict. It means that even if the first two elements are distinct, it will still complete the entire search. it also uses O(k) extra space where k is the number of distinct elements in the list.

aaronasterling 2010-10-02 07:58:44

Why the hell does this work faster than the naive manual iteration through all elements?? It has to build a set after all! But when I profiled this function, it worked 13 times faster than the naive implementation `for i in range(1, len(input_list)): if input_list[i-1] != input_list[i]: return False #otherwise return True` I set `input_list = ['x'] * 100000000`

max 2010-10-02 08:16:55

@max. because building the set happens in C and you have a bad implementation. You should at least do it in a generator expression. See KennyTM's answer for how to do it correctly without using a set.

aaronasterling 2010-10-02 08:20:18

Answer 4

A:

>>> a = [1, 2, 3, 4, 5, 6]
>>> z = [(a[x], a[x+1]) for x in range(0, len(a)-1)]
>>> z
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]
# Replacing it with the test
>>> z = [(a[x] == a[x+1]) for x in range(0, len(a)-1)]
>>> z
[False, False, False, False, False]
>>> if False in z : Print "All elements are not equal"

pyfunc 2010-10-02 07:45:27

Answer 5

+1 A:

This is a simple way of doing it:

result = mylist and all(mylist[0] == elem for elem in mylist)

This is slightly more complicated, it incurs function call overhead, but the semantics are more clearly spelled out:

def all_identical(seq):
    if not seq:
        # empty list is False.
        return False
    first = seq[0]
    return all(first == elem for elem in seq)

Jerub 2010-10-02 08:11:43

Answer 6

+3 A:

This is another option, faster than len(set(x))==1 for long lists (uses short circuit)

def constantList(x):
    return x and [x[0]]*len(x) == x

6502 2010-10-02 08:22:50

It is 3 times slower than the set solution on my computer, ignoring short circuit. So if the unequal element is found on average in the first third of the list, it's faster on average.

max 2010-10-02 09:21:46

Answer 7

+8 A:

A solution faster than using set() that works on sequences (not iterables) is to simply count the first element. This assumes the list is non-empty (but that's trivial to check, and decide yourself what the outcome should be on an empty list)

x.count(x[0]) == len(x)

some simple benchmarks:

>>> timeit.timeit('len(set(s1))<=1', 's1=[1]*5000', number=10000)
1.4383411407470703
>>> timeit.timeit('len(set(s1))<=1', 's1=[1]*4999+[2]', number=10000)
1.4765670299530029
>>> timeit.timeit('s1.count(s1[0])==len(s1)', 's1=[1]*5000', number=10000)
0.26274609565734863
>>> timeit.timeit('s1.count(s1[0])==len(s1)', 's1=[1]*4999+[2]', number=10000)
0.25654196739196777

Ivo van der Wijk 2010-10-02 08:25:21

OMG, this is 6 times faster than the set solution! (280 million elements/sec vs 45 million elements/sec on my laptop). Why??? And is there any way to modify it so that it short circuits (I guess not...)

max 2010-10-02 09:18:59

I guess list.count has a highly optimized C implementation, and the length of the list is stored internally, so len() is cheap as well. There's not a way to short-circuit count() since you will need to really check all elements to get the correct count.

Ivo van der Wijk 2010-10-02 10:01:26

Can I change it to: `x.count(next(x)) == len(x)` so that it works for any container x? Ahh.. nm, just saw that .count is only available for sequences.. Why isn't it implemented for other builtin containers? Is counting inside a dictionary inherently less meaningful than inside a list?

max 2010-10-05 05:09:33

An iterator may not have a length. E.g. it can be infinite or just dynamically generated. You can only find its length by converting it to a list which takes away most of the iterators advantages

Ivo van der Wijk 2010-10-05 05:51:12

ansaurus

tags:

views:

answers:

check if all elements in a list are identical

related questions