ansaurus

Question

Testing if a list contains another list with Python

Answer 1

A:

[Note this answer doesn't work. My bad.]

If all items are unique, you can use sets.

>>> items = set([-1, 0, 1, 2])
>>> items.issubset(set([1, 2]))
True
>>> items.issubset(set([1, 3]))
False

Thomas O 2010-10-02 20:22:47

that's not what op is looking for

SilentGhost 2010-10-02 20:24:50

It would work for unique lists. In fact it would work for non-unique items, but it wouldn't be able to determine between the individual items. (so you couldn't compare between [1, 1, 2] and [1, 2].)

Thomas O 2010-10-02 20:25:54

Okay, that's why I used the qualifier "if all items are unique." I didn't realise, as your example didn't make it clear that you needed distinction between identical items.

Thomas O 2010-10-02 20:28:12

It still won't work, even if all items are unique, unless they are also in the same order, with no intermingled items. e.g. finding `[1, 3]` or `[2, 1]` in `[1, 2, 3]` will give a false positive. Assuming that we're looking for the sequence itself rather than just the values contained in the sequence.

intuited 2010-10-02 20:34:59

Oh. Thanks for pointing that out. eumiro's answer looks better then.

Thomas O 2010-10-02 20:37:03

@Thomas O When you post an answer that is a) wrong, b) not what OP is looking for or C) all of the above, there is a 'delete' link at the bottom left of your post. feel free to click it and 1) declutter the thread, 2) avoid exposing yourself to further -1's.

aaronasterling 2010-10-02 21:51:16

I think it's better to show how an idea can be wrong (so as to avoid it in future), rather than erasing it entirely.

Thomas O 2010-10-02 22:03:20

Answer 2

+3 A:

After OP's edit:

def contains(small, big):
    for i in xrange(1 + len(big) - len(small)):
        if small == big[i:i+len(small)]:
            return i, i + len(small) - 1
    return False

eumiro 2010-10-02 20:26:11

But it fails with contains([1,2], [-1, 0, 1, 1, 2]) which returns [2,4] instead of what I assume is the expected [3,4]

Andrew Dalke 2010-10-02 20:40:08

Now it works with all OP's tests.

eumiro 2010-10-02 20:49:02

This is going to be horribly inefficient for big lists, since it is constantly creating and destroying temporary lists every time it does `big[i:i+len(small)]`

Dave Kirby 2010-10-02 21:22:23

@Dave: did you timeit this against your solution?

SilentGhost 2010-10-02 21:41:48

According to my tests, this has slightly better performance than Dave Kirby's solution, even on large lists (1 million elements, with the matching subset at the end): 4.1s for 10 repetitions versus 5.6s for Dave's. I would love to post my test code, but there isn't an easy way to do that.

Tim Yates 2010-10-02 22:20:23

@Tim Yates: Could you post your test code in a [gist](gist.github.com) or other pastebin-type deal? Maybe [codepad](codepad.org) would be a better option, though I don't know if running it on that site would be a) possible; b) reliable. It seems to be borken at the moment anyway.

intuited 2010-10-02 22:34:20

**UPDATE**: I spoke too soon--my small lists were too small. This algorithm exploded once I increased their size to 1000, while the others stayed constant. It looks like Dave Kirby's wins for large lists after all. http://pastebin.com/NZwU6PUx

Tim Yates 2010-10-02 22:46:22

This should return a tuple, not a list. A list is completely inappropriate for this usage. Corrected in the original post.

chpwn 2010-10-03 00:11:09

Answer 3

+2 A:

Here is my version:

def contains(small, big):
    for i in xrange(len(big)-len(small)+1):
        for j in xrange(len(small)):
            if big[i+j] != small[j]:
                break
        else:
            return i, i+len(small)
    return False

It returns a tuple of (start, end+1) since I think that is more pythonic, as Andrew Jaffe points out in his comment. It does not slice any sublists so should be reasonably efficient.

One point of interest for newbies is that it uses the else clause on the for statement - this is not something I use very often but can be invaluable in situations like this.

This is identical to finding substrings in a string, so for large lists it may be more efficient to implement something like the Boyer-Moore algorithm.

Dave Kirby 2010-10-02 21:23:54

+1 for the note about efficient string searching algorithms. One disadvantage of yours is the addition of an interpreted inner loop (the slice comparison is, I imagine, faster, although the copy might offset that). I'm going to try a performance comparison.

Tim Yates 2010-10-02 21:47:04

After further tests, yours *is* the best so far for large subsequences. I would pick this, even despite the small disadvantage is has on smaller data sets.

Tim Yates 2010-10-02 22:52:58

Answer 4

A:

This works and is fairly fast since it does the linear searching using the builtin list.index() method and == operator:

def contains(sub, pri):
    M, N = len(pri), len(sub)
    i, LAST = 0, M-N+1
    while True:
        try:
            found = pri.index(sub[0], i, LAST) # find first elem in sub
        except ValueError:
            return False
        if pri[found:found+N] == sub:
            return [found, found+N-1]
        else:
            i = found+1

martineau 2010-10-02 21:33:39

Answer 5

A:

I tried to make this as efficient as possible.

It uses a generator; those unfamiliar with these beasts are advised to check out their documentation and that of yield expressions.

Basically it creates a generator of values from the subsequence that can be reset by sending it a true value. If the generator is reset, it starts yielding again from the beginning of sub.

Then it just compares successive values of sequence with the generator yields, resetting the generator if they don't match.

When the generator runs out of values, i.e. reaches the end of sub without being reset, that means that we've found our match.

Since it works for any sequence, you can even use it on strings, in which case it behaves similarly to str.find, except that it returns False instead of -1.

As a further note: I think that the second value of the returned tuple should, in keeping with Python standards, normally be one higher. i.e. "string"[0:2] == "st". But the spec says otherwise, so that's how this works.

It depends on if this is meant to be a general-purpose routine or if it's implementing some specific goal; in the latter case it might be better to implement a general-purpose routine and then wrap it in a function which twiddles the return value to suit the spec.

def reiterator(sub):
    """Yield elements of a sequence, resetting if sent ``True``."""
    it = iter(sub)
    while True:
        if (yield it.next()):
            it = iter(sub)

def find_in_sequence(sub, sequence):
    """Find a subsequence in a sequence.

    >>> find_in_sequence([2, 1], [-1, 0, 1, 2])
    False
    >>> find_in_sequence([-1, 1, 2], [-1, 0, 1, 2])
    False
    >>> find_in_sequence([0, 1, 2], [-1, 0, 1, 2])
    (1, 3)
    >>> find_in_sequence("subsequence",
    ...                  "This sequence contains a subsequence.")
    (25, 35)
    >>> find_in_sequence("subsequence", "This one doesn't.")
    False

    """
    start = None
    sub_items = reiterator(sub)
    sub_item = sub_items.next()
    for index, item in enumerate(sequence):
        if item == sub_item:
            if start is None: start = index
        else:
            start = None
        try:
            sub_item = sub_items.send(start is None)
        except StopIteration:
            # If the subsequence is depleted, we win!
            return (start, index)
    return False

intuited 2010-10-02 22:04:17

A valiant effort, but this has worse performance than either eumiro or Dave Kirby's solutions. 8.2s on the benchmark I described in the other comments.

Tim Yates 2010-10-02 22:21:40

Interesting to see the speed difference for native code. It seems like this algorithm would be relatively faster for longer subsequences.. how long was/were the subsequence(s) you used in the test?

intuited 2010-10-02 22:37:31

You're right. This performed much better than eumiro's solution with larger subsequences, but it still didn't perform as well as Dave's.

Tim Yates 2010-10-02 22:50:52

Answer 6

A:

Here's a straightforward algorithm that uses list methods:

#!/usr/bin/env python

def list_find(what, where):
    """Find `what` list in the `where` list.

    Return index in `where` where `what` starts
    or -1 if no such index.

    >>> f = list_find
    >>> f([2, 1], [-1, 0, 1, 2])
    -1
    >>> f([-1, 1, 2], [-1, 0, 1, 2])
    -1
    >>> f([0, 1, 2], [-1, 0, 1, 2])
    1
    >>> f([1,2], [-1, 0, 1, 2])
    2
    >>> f([1,3], [-1, 0, 1, 2])
    -1
    >>> f([1, 2], [[1, 2], 3])
    -1
    >>> f([[1, 2]], [[1, 2], 3])
    0
    """
    if not what: # empty list is always found
        return 0
    try:
        index = 0
        while True:
            index = where.index(what[0], index)
            if where[index:index+len(what)] == what:
                return index # found
            index += 1 # try next position
    except ValueError:
        return -1 # not found

def contains(what, where):
    """Return [start, end+1] if found else empty list."""
    i = list_find(what, where)
    return [i, i + len(what)] if i >= 0 else [] #NOTE: bool([]) == False

if __name__=="__main__":
    import doctest; doctest.testmod()

J.F. Sebastian 2010-10-02 23:44:58

Answer 7

A:

inspectorG4dget 2010-10-03 00:09:29

Answer 8

A:

I think this one is fast...

def issublist(subList, myList, start=0):
    if not subList: return 0
    lenList, lensubList = len(myList), len(subList)
    try:
        while lenList - start >= lensubList:
            start = myList.index(subList[0], start)
            for i in xrange(lensubList):
                if myList[start+i] != subList[i]:
                    break
            else:
                return start, start + lensubList
            start += 1
        return False
    except:
        return False

2010-10-03 07:10:55

Answer 9

A:

May I humbly suggest the Rabin-Karp algorithm if the big list is really big. The link even contains almost-usable code in almost-Python.

9000 2010-10-03 11:31:38

ansaurus

tags:

views:

answers:

Testing if a list contains another list with Python

related questions