views:

304

answers:

7

I need the following function:

Input: a list.

Output: True if all elements in the input list evaluate as equal to each other using the standard equality operator; False otherwise.

Performance: of course, I prefer not to incur any unnecessary overhead.

I feel it would be best to iterate through the list, compare adjacent elements, and AND all the resulting Boolean values. But I'm not sure what's the most Pythonic way to do that.


EDIT:

Thank you for all the great answers. I rated up several, and it was really hard to choose between KennyTM and Ivo van der Wijk...

+1  A: 

Doubt this is the "most Pythonic", but something like:

>>> falseList = [1,2,3,4]
>>> trueList = [1, 1, 1]
>>> 
>>> def testList(list):
...   for item in list[1:]:
...     if item != list[0]:
...       return False
...   return True
... 
>>> testList(falseList)
False
>>> testList(trueList)
True

would do the trick.

machineghost
+10  A: 

General method:

   def checkEqual1(iterator):
      try:
         iterator = iter(iterator)
         first = next(iterator)
         return all(first == rest for rest in iterator)
      except StopIteration:
         return True

One-liner:

    def checkEqual2(iterator):
       return len(set(iterator)) <= 1

Also one-liner:

    def checkEqual3(lst):
       return lst[1:] == lst[:-1]

The difference between the 3 versions are that:

  1. In checkEqual2 the content must be hashable.
  2. checkEqual1 and checkEqual2 can use any iterators, but checkEqual3 must take a sequence input, typically concrete containers like a list or tuple.
  3. checkEqual1 stops as soon as a difference is found.
  4. Since checkEqual1 contains more Python code, it is less efficient when many of the items are equal in the beginning.
  5. Since checkEqual2 and checkEqual3 always perform O(N) copying operations, they will take longer if most of your input will return False.
  6. checkEqual2 and checkEqual3 can't be easily changed to adopt to compare a is b instead of a == b.

timeit result, for Python 2.7 and (only s1, s4, s7, s9 should return True)

s1 = [1] * 5000
s2 = [1] * 4999 + [2]
s3 = [2] + [1]*4999
s4 = [set([9])] * 5000
s5 = [set([9])] * 4999 + [set([10])]
s6 = [set([10])] + [set([9])] * 4999
s7 = [1,1]
s8 = [1,2]
s9 = []

we get

     checkEqual1  checkEqual2   checkEqual3 checkEqualIvo checkEqual6502

s1 1.19     msec  348    usec  183     usec   51.6   usec   121     usec
s2 1.17     msec  376    usec  185     usec   50.9   usec   118     usec
s3     4.17 usec  348    usec  120     usec  264     usec    61.3   usec

s4 1.73     msec               182     usec   50.5   usec   121     usec
s5 1.71     msec               181     usec   50.6   usec   125     usec
s6     4.29 usec               122     usec  423     usec    61.1   usec

s7     3.1  usec    1.4  usec    1.24  usec    0.932 usec     1.92  usec
s8     4.07 usec    1.54 usec    1.28  usec    0.997 usec     1.79  usec
s9     5.91 usec    1.25 usec    0.749 usec    0.407 usec     0.386 usec

Note:

# http://stackoverflow.com/q/3844948/
def checkEqualIvo(lst):
    return not lst or lst.count(lst[0]) == len(lst)

# http://stackoverflow.com/q/3844931/
def checkEqual6502(lst):
    return not lst or [lst[0]]*len(lst) == lst
KennyTM
+1 for the propably optimal solution
delnan
+1 for use of set to solve this the right way.
Gabriel
Thank you, this is a really helpful explanation of alternatives. Can you please double check your performance table - is it all in msec, and are the numbers in the correct cells?
max
@max: Yes. Note that 1 msec = 1000 usec.
KennyTM
Don't forget memory usage analysis for very large arrays, a native solution which optimizes away calls to `obj.__eq__` when `lhs is rhs`, and out-of-order optimizations to allow short circuiting sorted lists more quickly.
Glenn Maynard
Ivo van der Wijk has a better solution for sequences that's about 5 times faster than set and O(1) in memory.
aaronasterling
@AaronMcSmooth: Without being a criticism of this answer, it's telling that this answer will probably remain at four times the score of the other: not due to comparative value, but due to the common early-answer and popular-answer vote bias of this site.
Glenn Maynard
@Glenn Maynard: I agree with you (shock, horror!). Some upvoters are positively Gadarene :-)
John Machin
This is the only really correct (CS-wise) since it does not always traverse the entire list.
nimrodm
+5  A: 

You can convert the list to a set. A set cannot have duplicates. So if all the elements in the original list are identical, the set will have just one element.

if len(sets.Set(input_list)) == 1
// input_list has all identical elements.
codaddict
this is nice but it doesn't short circuit and you have to calculate the length of the resulting list.
aaronasterling
why not just `len(set(input_list)) == 1`?
Nick D
@Nick: Thanks for pointing.
codaddict
@AaronMcSmooth: Still a noob in py. Don't even know what a short circut in py means :)
codaddict
@codaddict. It means that even if the first two elements are distinct, it will still complete the entire search. it also uses O(k) extra space where k is the number of distinct elements in the list.
aaronasterling
Why the hell does this work faster than the naive manual iteration through all elements?? It has to build a set after all! But when I profiled this function, it worked 13 times faster than the naive implementation `for i in range(1, len(input_list)): if input_list[i-1] != input_list[i]: return False #otherwise return True` I set `input_list = ['x'] * 100000000`
max
@max. because building the set happens in C and you have a bad implementation. You should at least do it in a generator expression. See KennyTM's answer for how to do it correctly without using a set.
aaronasterling
A: 
>>> a = [1, 2, 3, 4, 5, 6]
>>> z = [(a[x], a[x+1]) for x in range(0, len(a)-1)]
>>> z
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]
# Replacing it with the test
>>> z = [(a[x] == a[x+1]) for x in range(0, len(a)-1)]
>>> z
[False, False, False, False, False]
>>> if False in z : Print "All elements are not equal"
pyfunc
+1  A: 

This is a simple way of doing it:

result = mylist and all(mylist[0] == elem for elem in mylist)

This is slightly more complicated, it incurs function call overhead, but the semantics are more clearly spelled out:

def all_identical(seq):
    if not seq:
        # empty list is False.
        return False
    first = seq[0]
    return all(first == elem for elem in seq)
Jerub
+3  A: 

This is another option, faster than len(set(x))==1 for long lists (uses short circuit)

def constantList(x):
    return x and [x[0]]*len(x) == x
6502
It is 3 times slower than the set solution on my computer, ignoring short circuit. So if the unequal element is found on average in the first third of the list, it's faster on average.
max
+8  A: 

A solution faster than using set() that works on sequences (not iterables) is to simply count the first element. This assumes the list is non-empty (but that's trivial to check, and decide yourself what the outcome should be on an empty list)

x.count(x[0]) == len(x)

some simple benchmarks:

>>> timeit.timeit('len(set(s1))<=1', 's1=[1]*5000', number=10000)
1.4383411407470703
>>> timeit.timeit('len(set(s1))<=1', 's1=[1]*4999+[2]', number=10000)
1.4765670299530029
>>> timeit.timeit('s1.count(s1[0])==len(s1)', 's1=[1]*5000', number=10000)
0.26274609565734863
>>> timeit.timeit('s1.count(s1[0])==len(s1)', 's1=[1]*4999+[2]', number=10000)
0.25654196739196777
Ivo van der Wijk
OMG, this is 6 times faster than the set solution! (280 million elements/sec vs 45 million elements/sec on my laptop). Why??? And is there any way to modify it so that it short circuits (I guess not...)
max
I guess list.count has a highly optimized C implementation, and the length of the list is stored internally, so len() is cheap as well. There's not a way to short-circuit count() since you will need to really check all elements to get the correct count.
Ivo van der Wijk
Can I change it to: `x.count(next(x)) == len(x)` so that it works for any container x? Ahh.. nm, just saw that .count is only available for sequences.. Why isn't it implemented for other builtin containers? Is counting inside a dictionary inherently less meaningful than inside a list?
max
An iterator may not have a length. E.g. it can be infinite or just dynamically generated. You can only find its length by converting it to a list which takes away most of the iterators advantages
Ivo van der Wijk