tags:

views:

404

answers:

8

Does python provide an elegant way to check for "equality" of sequences of different types? The following work, but they seem rather ugly and verbose for python code:

def comp1(a, b):
    if len(a) != len(b):
        return False
    for i, v in enumerate(a):
        if v != b[i]:
            return False
    return True

The following is a bit shorter, but also less efficient since a third sequence is created:

def comp2(a, b):
    for l, r in map(None, a, b):
        if l != r:
            return False
    return True

Shoehorning one of those examples into a list comprehension isn't really what I'm looking for either.

Edit: Ideally I am looking for a solution that doesn't create another sequence during the comparison.

+12  A: 

Convert both sequences to lists, and use builtin list comparison. It should be sufficient, unless your sequences are really large.

list(a) == list(b)

Edit:

Testing done by schickb shows that using tuples is slightly faster:

tuple(a) == tuple(b)
Ayman Hourieh
That creates two additional lists. Since the lists can be long, I'd like to avoid this.
schickb
@schickb: How long are we talking about? Based on the title and the first sentence of your post, it's fair to put elegance as the top priority and efficiency as a bonus. To me, conversion to (new) lists is by far the most elegant and most "programmer-efficient" solution.
John Y
@John, yeah I agree after some more testing. This solution is actually much faster than even the enumerate loop. I'm sure that changes at some sequence size however. I'll mark this one as the answer even though tuple(a) == tuple(b) seems to be better.
schickb
@schickb - I added the tuple approach to the answer.
Ayman Hourieh
A: 

There's always:

list(a) == list(b)
outis
+9  A: 

You can determine the equality of any two iterables (strings, tuples, lists, even custom sequences) without creating and storing duplicate lists by using the following:

all(x == y for x, y in itertools.izip_longest(a, b))

Note that if the two iterables are not the same length, the shorter one will be padded with Nones. In other words, it will consider [1, 2, None] to be equal to (1, 2).

Edit: As Kamil points out in the comments, izip_longest is only available in Python 2.6. However, the docs for the function also provide an alternate implementation which should work all the way back to 2.3.

Edit 2: After testing on a few different machines, it looks like this is only faster than list(a) == list(b) in certain circumstances, which I can't isolate. Most of the time, it takes about seven times as long. However, I also found tuple(a) == tuple(b) to be consistently at least twice as fast as the list version.

Ben Blank
Exactly what I was about to say, except I forgot about the usage of all(). The other caveat with this solution is that it requires Python 2.6 or higher.
Kamil Kisiel
Nice and short, but that list comprehension creates a third sequence.
schickb
@Kamil — True! The 2.6 docs for itertools offer a 2.5-friendly solution, as I recall, I'll see if I can't dig up a link.
Ben Blank
@schickb — Not so, that's a generator, not a list comprehension. It creates a generator object which only creates each element when it's ready to emit it.
Ben Blank
This solution is really neat. I really hate to say it, it is about 5 times slower than converting to lists
Nadia Alramli
Nadia: you mentioned that on my answer as well, but I can't reproduce your results. On any large input, lists are significantly slower for me than generators.
John Millikin
@Nadia — My results match John's. What tests are you running to compare speeds?
Ben Blank
@Ben and vili: Even worse than a third sequence, then. A function call per entry.
schickb
@schickb — I see no function calls here, aside from `izip_longest` itself, which is only run once…? The overhead of iterating over the generator is no different than that of a standard `for` loop.
Ben Blank
@John, my test was comparing a=xrange(100), b=xrange(100) 100000 times. Now I did another test on much larger lists with 1000000 entries and your algorithm was indeed fast
Nadia Alramli
@Ben, your method is still slower with 1000000 entries that both converting to list and John's izip method
Nadia Alramli
@Nadia — Looks like it may have been a quirk of the particular machine I tested it on. I've tried a few others and it takes about seven times as long as the list version.
Ben Blank
@Ben: Well not a function call per entry, but the added function call to the generator. Test the following and you'll see that the first is slower. Of course the second consumes much more memory. On my system both are *far* slower than the list comparison.t1 = timeit.Timer("all(x == y for x, y in itertools.izip_longest(a, b))", "a=b=[1]*100")t2 = timeit.Timer("all([x == y for x, y in itertools.izip_longest(a, b)])", "a=b=[1]*100")
schickb
+1  A: 

Since you put the word "equality" in quotes, I assume that you would like to know how the lists are the same and how the are different. Check out difflib which has a SequenceMatcher class:

    sm = difflib.SequenceMatcher(None, a, b)
    for opcode in sm.get_opcodes():
        print "    (%s %d:%d %d:%d)" % opcode

You will get back a sequences of descriptions of the differences. It's fairly simple to turn that into diff-like output.

Joel
A: 

It's probably not as efficient, but it looks funky:

def cmpLists(a, b):
    return len(a) == len(b) and (False not in [a[i] == b[i] for i in range(0,len(a)])

I don't know the "all" function that Ben mentioned, but perhaps you could use that instead of "False not in"

Smashery
+2  A: 

It looks like tuple(a) == tuple(b) is the best overall choice. Or perhaps tuple comparison with a preceding len check if they'll often be different lengths. This does create extra lists, but hopefully not an issue except for really huge lists. Here is my comparison of the various alternatives suggested:

import timeit

tests = (
'''
a=b=[5]*100
''',

'''
a=[5]*100
b=[5]*3
''',

'''
a=b=(5,)*100
''',

'''
a=b="This on is a string" * 5
''',

'''
import array
a=b=array.array('B', "This on is a string" * 5)
'''
)

common = '''import itertools
def comp1(a, b):
    if len(a) != len(b):
        return False
    for i, v in enumerate(a):
        if v != b[i]:
            return False
    return True'''

for i, setup in enumerate(tests):
    t1 = timeit.Timer("comp1(a, b)", setup + common)
    t2 = timeit.Timer("all(x == y for x, y in itertools.izip_longest(a, b))", setup + common)
    t3 = timeit.Timer("all([x == y for x, y in itertools.izip_longest(a, b)])", setup + common)
    t4 = timeit.Timer("list(a) == list(b)", setup + common)
    t5 = timeit.Timer("tuple(a) == tuple(b)", setup + common)

    print '==test %d==' % i
    print '   comp1: %g' % t1.timeit()
    print ' all gen: %g' % t2.timeit()
    print 'all list: %g' % t3.timeit()
    print '    list: %g' % t4.timeit()
    print '   tuple: %g\n' % t5.timeit()

Here are the results:

==test 0==
   comp1: 27.8089
 all gen: 31.1406
all list: 29.4887
    list: 3.58438
   tuple: 3.25859

==test 1==
   comp1: 0.833313
 all gen: 3.8026
all list: 33.5288
    list: 1.90453
   tuple: 1.74985

==test 2==
   comp1: 30.606
 all gen: 31.4755
all list: 29.5637
    list: 3.56635
   tuple: 1.60032

==test 3==
   comp1: 33.3725
 all gen: 35.3699
all list: 34.2619
    list: 10.2443
   tuple: 10.1124

==test 4==
   comp1: 31.7014
 all gen: 32.0051
all list: 31.0664
    list: 8.35031
   tuple: 8.16301

Edit: Added a few more tests. This was run on an AMD 939 3800+ with 2GB of ram. Linux 32bit, Python 2.6.2

schickb
Now run all the same tests with Psyco .
Brian
A: 

This "functional" code should be fast and generic enough for all purposes.

# python 2.6 ≤ x < 3.0
import operator, itertools as it

def seq_cmp(seqa, seqb):
    return all(it.starmap(operator.eq, it.izip_longest(seqa, seqb)))

If on Python 2.5, use the definition for izip_longest from there.

ΤΖΩΤΖΙΟΥ
A: 

Apart from the extra memory used by creating temporary lists/tuples, those answers will lose out to short circuiting generator solutions for large sequences when the inequality occurs early in the sequences

from itertools import starmap, izip
from operator import eq
all(starmap(eq, izip(x,y)))

or more concisely

from itertools import imap
from operator import eq
all(imap(eq, x, y))

some benchmarks from ipython

x=range(1000)
y=range(1000);y[10]=0

timeit tuple(x)==tuple(y)
100000 loops, best of 3: 16.9 us per loop

timeit all(imap(eq,x,y))
100000 loops, best of 3: 2.86 us per loop
gnibbler