ansaurus

Question

Answer 1

+12 A:

Convert both sequences to lists, and use builtin list comparison. It should be sufficient, unless your sequences are really large.

list(a) == list(b)

Edit:

Testing done by schickb shows that using tuples is slightly faster:

tuple(a) == tuple(b)

Ayman Hourieh 2009-05-22 23:36:48

That creates two additional lists. Since the lists can be long, I'd like to avoid this.

schickb 2009-05-22 23:44:02

@schickb: How long are we talking about? Based on the title and the first sentence of your post, it's fair to put elegance as the top priority and efficiency as a bonus. To me, conversion to (new) lists is by far the most elegant and most "programmer-efficient" solution.

John Y 2009-05-23 00:25:23

@John, yeah I agree after some more testing. This solution is actually much faster than even the enumerate loop. I'm sure that changes at some sequence size however. I'll mark this one as the answer even though tuple(a) == tuple(b) seems to be better.

schickb 2009-05-23 01:32:26

@schickb - I added the tuple approach to the answer.

Ayman Hourieh 2009-05-23 01:48:05

Answer 2

A:

There's always:

list(a) == list(b)

outis 2009-05-22 23:37:45

Answer 3

+9 A:

You can determine the equality of any two iterables (strings, tuples, lists, even custom sequences) without creating and storing duplicate lists by using the following:

all(x == y for x, y in itertools.izip_longest(a, b))

Note that if the two iterables are not the same length, the shorter one will be padded with Nones. In other words, it will consider [1, 2, None] to be equal to (1, 2).

Edit: As Kamil points out in the comments, izip_longest is only available in Python 2.6. However, the docs for the function also provide an alternate implementation which should work all the way back to 2.3.

Edit 2: After testing on a few different machines, it looks like this is only faster than list(a) == list(b) in certain circumstances, which I can't isolate. Most of the time, it takes about seven times as long. However, I also found tuple(a) == tuple(b) to be consistently at least twice as fast as the list version.

Ben Blank 2009-05-22 23:45:37

Exactly what I was about to say, except I forgot about the usage of all(). The other caveat with this solution is that it requires Python 2.6 or higher.

Kamil Kisiel 2009-05-22 23:48:55

Nice and short, but that list comprehension creates a third sequence.

schickb 2009-05-22 23:50:36

@Kamil — True! The 2.6 docs for itertools offer a 2.5-friendly solution, as I recall, I'll see if I can't dig up a link.

Ben Blank 2009-05-22 23:51:41

@schickb — Not so, that's a generator, not a list comprehension. It creates a generator object which only creates each element when it's ready to emit it.

Ben Blank 2009-05-22 23:52:21

This solution is really neat. I really hate to say it, it is about 5 times slower than converting to lists

Nadia Alramli 2009-05-22 23:53:21

Nadia: you mentioned that on my answer as well, but I can't reproduce your results. On any large input, lists are significantly slower for me than generators.

John Millikin 2009-05-22 23:54:34

@Nadia — My results match John's. What tests are you running to compare speeds?

Ben Blank 2009-05-22 23:56:41

@Ben and vili: Even worse than a third sequence, then. A function call per entry.

schickb 2009-05-22 23:57:06

@schickb — I see no function calls here, aside from `izip_longest` itself, which is only run once…? The overhead of iterating over the generator is no different than that of a standard `for` loop.

Ben Blank 2009-05-22 23:58:13

@John, my test was comparing a=xrange(100), b=xrange(100) 100000 times. Now I did another test on much larger lists with 1000000 entries and your algorithm was indeed fast

Nadia Alramli 2009-05-23 00:00:33

@Ben, your method is still slower with 1000000 entries that both converting to list and John's izip method

Nadia Alramli 2009-05-23 00:03:21

@Nadia — Looks like it may have been a quirk of the particular machine I tested it on. I've tried a few others and it takes about seven times as long as the list version.

Ben Blank 2009-05-23 00:12:55

@Ben: Well not a function call per entry, but the added function call to the generator. Test the following and you'll see that the first is slower. Of course the second consumes much more memory. On my system both are *far* slower than the list comparison.t1 = timeit.Timer("all(x == y for x, y in itertools.izip_longest(a, b))", "a=b=[1]*100")t2 = timeit.Timer("all([x == y for x, y in itertools.izip_longest(a, b)])", "a=b=[1]*100")

schickb 2009-05-23 00:20:28

Answer 4

+1 A:

Since you put the word "equality" in quotes, I assume that you would like to know how the lists are the same and how the are different. Check out difflib which has a SequenceMatcher class:

    sm = difflib.SequenceMatcher(None, a, b)
    for opcode in sm.get_opcodes():
        print "    (%s %d:%d %d:%d)" % opcode

You will get back a sequences of descriptions of the differences. It's fairly simple to turn that into diff-like output.

Joel 2009-05-23 00:03:33

Answer 5

A:

It's probably not as efficient, but it looks funky:

def cmpLists(a, b):
    return len(a) == len(b) and (False not in [a[i] == b[i] for i in range(0,len(a)])

I don't know the "all" function that Ben mentioned, but perhaps you could use that instead of "False not in"

Smashery 2009-05-23 00:05:15

Answer 6

+2 A:

It looks like tuple(a) == tuple(b) is the best overall choice. Or perhaps tuple comparison with a preceding len check if they'll often be different lengths. This does create extra lists, but hopefully not an issue except for really huge lists. Here is my comparison of the various alternatives suggested:

import timeit

tests = (
'''
a=b=[5]*100
''',

'''
a=[5]*100
b=[5]*3
''',

'''
a=b=(5,)*100
''',

'''
a=b="This on is a string" * 5
''',

'''
import array
a=b=array.array('B', "This on is a string" * 5)
'''
)

common = '''import itertools
def comp1(a, b):
    if len(a) != len(b):
        return False
    for i, v in enumerate(a):
        if v != b[i]:
            return False
    return True'''

for i, setup in enumerate(tests):
    t1 = timeit.Timer("comp1(a, b)", setup + common)
    t2 = timeit.Timer("all(x == y for x, y in itertools.izip_longest(a, b))", setup + common)
    t3 = timeit.Timer("all([x == y for x, y in itertools.izip_longest(a, b)])", setup + common)
    t4 = timeit.Timer("list(a) == list(b)", setup + common)
    t5 = timeit.Timer("tuple(a) == tuple(b)", setup + common)

    print '==test %d==' % i
    print '   comp1: %g' % t1.timeit()
    print ' all gen: %g' % t2.timeit()
    print 'all list: %g' % t3.timeit()
    print '    list: %g' % t4.timeit()
    print '   tuple: %g\n' % t5.timeit()

Here are the results:

==test 0==
   comp1: 27.8089
 all gen: 31.1406
all list: 29.4887
    list: 3.58438
   tuple: 3.25859

==test 1==
   comp1: 0.833313
 all gen: 3.8026
all list: 33.5288
    list: 1.90453
   tuple: 1.74985

==test 2==
   comp1: 30.606
 all gen: 31.4755
all list: 29.5637
    list: 3.56635
   tuple: 1.60032

==test 3==
   comp1: 33.3725
 all gen: 35.3699
all list: 34.2619
    list: 10.2443
   tuple: 10.1124

==test 4==
   comp1: 31.7014
 all gen: 32.0051
all list: 31.0664
    list: 8.35031
   tuple: 8.16301

Edit: Added a few more tests. This was run on an AMD 939 3800+ with 2GB of ram. Linux 32bit, Python 2.6.2

schickb 2009-05-23 00:34:34

Now run all the same tests with Psyco .

Brian 2010-08-09 14:04:33

Answer 7

A:

This "functional" code should be fast and generic enough for all purposes.

# python 2.6 ≤ x < 3.0
import operator, itertools as it

def seq_cmp(seqa, seqb):
    return all(it.starmap(operator.eq, it.izip_longest(seqa, seqb)))

If on Python 2.5, use the definition for izip_longest from there.

ΤΖΩΤΖΙΟΥ 2009-05-24 12:29:19

Answer 8

A:

Apart from the extra memory used by creating temporary lists/tuples, those answers will lose out to short circuiting generator solutions for large sequences when the inequality occurs early in the sequences

from itertools import starmap, izip
from operator import eq
all(starmap(eq, izip(x,y)))

or more concisely

from itertools import imap
from operator import eq
all(imap(eq, x, y))

some benchmarks from ipython

x=range(1000)
y=range(1000);y[10]=0

timeit tuple(x)==tuple(y)
100000 loops, best of 3: 16.9 us per loop

timeit all(imap(eq,x,y))
100000 loops, best of 3: 2.86 us per loop

gnibbler 2010-08-09 12:30:27

ansaurus

tags:

views:

answers:

Elegant way to compare sequences

related questions