views:

306

answers:

7

Im a trying to find a sublist of a list. Meaning if list1 say [1,5] is in list2 say [1,4,3,5,6] than it should return True. What I have so far is this:

for nums in l1:
    if nums in l2:
        return True
    else:
        return False

This would be true but I'm trying to return True only if list1 is in list2 in the respective order. So if list2 is [5,2,3,4,1], it should return False. I was thinking along the lines of comparing the index values of list1 using < but I'm not sure.

+2  A: 

This question looks a bit like homework and for this reason I'd like to take the time and discuss what may be going wrong with the snippet shown in the question.

Although you are using a word in its plural form, for the nums variable, you need to understand that Python will use this variable to store ONE item from l1 at a time, and go through the block of code in this "for block", one time for each different item.

The result of your current snippet will therefore be to exit upon the very first iteration, with either True or False depending if by chance the first items in the list happen to match.

Edit: Yes, A1, exactly as you said: the logic exits with True after the first iteration. This is because of the "return" when nums is found in l2.
If you were to do nothing in the "found" case, the loop the logic would proceed with finishing whatever logic in the block (none here) and it would then start the next iteration. Therefore it would only exit with a "False" return value, in the case when an item from l1 is not found l2 (indeed after the very first such not-found item). Therefore your logic is almost correct (if it were to do nothing in the "found case"), the one thing missing would be to return "True", systematically after the for loop (since if it didn't exit with a False value within the loop, then all items of l2 are in l1...).

There are two ways to modify the code so it does nothing for the "found case".
- by using pass, which is a convenient way to instruct Python to do nothing;
"pass" is typically used when "something", i.e. some action is syntactically required but we don't want anything done, but it can also be used when debugging etc.
- by rewriting the test as a "not in" instead

if nums not in l2:
   return False
#no else:, i.e. do nothing at all if found

Now... Getting into more details.
There may be a flaw in your program (with the suggested changes), that is that it would consider l1 to be a sublist of l2, even if l1 had say 2 items with value say 5 whereby l2 only had one such value. I'm not sure if that kind of consideration is part of the problem (possibly the understanding is that both lists are "sets", with no possible duplicate items). If duplicates were allowed however, you would have to complicate the logic somewhat (a possible approach would be to intitially make a copy of l2 and each time "nums" is find in the l2 copy, to remove this item.

Another consideration is that maybe a list can only be said to be a sublist if its items are found the same order as the items in the other list... Again it all depends on the way the problem is defined... BTW some of the solutions proposed, like Alex Martelli's are written in such fashion because they solve the problem in a way that the order of items with the lists matter.

mjv
Yes, this is for homework and you are right. Giving me the answer doesn't really help me because I just started learning python so I don't really know how the codes given work. As for your comment mjv on my code, are you saying that it would look just at 1 in list1, return true, and skip looking at 5? Should I be using something like range(len(list1))?
Al
+7  A: 
try:
  last_found = -1
  for num in L1:
     last_found = L2.index(num, last_found + 1)
  return True
except ValueError:
  return False

The index method of list L2 returns the position at which the first argument (num) is found in the list; called, like here, with a second arg, it starts looking in the list at that position. If index does not find what it's looking for, it raises a ValueError exception.

So, this code uses this approach to look for each item num of L1, in order, inside L2. The first time it needs to start looking from position 0; each following time, it needs to start looking from the position just after the last one where it found the previous item, i.e. last_found + 1 (so at the start we must set last_found = -1 to start looking from position 0 the first time).

If every item in L1 is found this way (i.e. it's found in L2 after the position where the previous item was found), then the two lists meet the given condition and the code returns True. If any item of L1 is ever not-found, the code catches the resulting ValueError exception and just returns False.

A different approach would be to use iterators over the two lists, that can be formed with the iter built-in function. You can "advance" an iterator by calling built-in next on it; this will raise StopIteration if there is no "next item", i.e., the iterator is exhausted. You can also use for on the iterator for a somewhat smoother interface, where applicable. The low-level approach using the iter/next idea:

i1 = iter(L1)
i2 = iter(L2)
while True:
  try:
    lookfor = next(i1)
  except StopIteration:
    # no more items to look for == all good!
    return True
  while True:
    try:
      maybe = next(i2)
    except StopIteration:
      # item lookfor never matched == nope!
      return False
    if maybe == lookfor:
      break

or, a bit higher-level:

i1 = iter(L1)
i2 = iter(L2)
for lookfor in i1:
  for maybe in i2:
    if maybe == lookfor:
      break
  else:
    # item lookfor never matched == nope!
    return False
# no more items to look for == all good!
return True

In fact, the only crucial use of iter here is to get i2 -- having the inner loop as for maybe in i2 guarantees the inner loop won't start looking from the beginning every time, but, rather, it will keep looking where it last left off. The outer loop might as well for for lookfor in L1:, since it has no "restarting" issue.

Key, here, is the else: clause of loops, which triggers if, and only if, the loop was not interrupted by break but rather exited naturally.

Working further on this idea we are again reminded of the in operator, which also can be made to continue where it last left off simply by using an iterator. Big simplification:

i2 = iter(L2)
for lookfor in L1:
  if lookfor not in i2:
    return False
# no more items to look for == all good!
return True

But now we recognize that is exactly the patter abstracted by the short-circuiting any and all built-in "short-circuiting accumulator" functions, so...:

i2 = iter(L2)
return all(lookfor in i2 for lookfor in L1)

which I believe is just about as simple as you can get. The only non-elementary bit left here is: you need to use an iter(L2) explicitly, just once, to make sure the in operator (intrinsically an inner loop) doesn't restart the search from the beginning but rather continues each time from where it last left off.

Alex Martelli
it can be improved a bit by having single line last_found = L2.index(num, last_found + 1), and moving try/except out of loop
Anurag Uniyal
@Anurag, good point, let me edit to include it.
Alex Martelli
...and actually I thought of an alternative iterator-based solution and pushed it to where I believe it definitely looks optimal, look at my edited answer for that development.
Alex Martelli
I've put the final solution at the top of the answer.
J.F. Sebastian
@J.F.: Don't; it's worth reading.
Jed Smith
I didn't know you can do `elem in iterator` and it walks through the iter. Kinda inconsistent since `len()` refuses to do the same.
THC4k
`len(y)` is defined to **not** "consume" its iterator argument `y` (if it did it would have to do so entirely); `if x in y` (like `for x in y`) **is** defined to consume whatever part of `y` it needs -- and the fact that it can be done in part and thus be **useful** is the key to understanding why this bit of the specs should be different, and is.
Alex Martelli
A: 

I have a feeling this is more intensive than Alex's answer, but here was my first thought:

def test(list1, list2):
    try:
        for num in list1:
            list2 = list2[list2.index(num):]
    except: return False
    else: return True

Edit: Just tried it. His is faster. It's close.

Edit 2: Moved try/except out of the loop (this is why others should look at your code). Thanks, gnibbler.

Jed Smith
you can also move the try/except out of the loop
gnibbler
+2  A: 

I think this solution is the fastest, since it iterates only once, albeit on the longer list and exits before finishing the iteration if a match is found. (Edit: However, it is not as succinct or as fast as Alex's latest solution)

def ck(l1,l2):
    i,j = 0,len(l1)
    for e in l2:
     if e == l1[i]:
      i += 1
     if i == j:
      return True
    return False

An improvement was suggested by Anurag Uniyal (see comment) and is reflected in the showdown below.

Here are some speed results for a range of list size ratios (List l1 is a 10-element list containing random values from 1-10. List l2 ranges from 10-1000 in length (and also contain random values from 1-10).

Code that compares run times and plots the results:

import random
import os
import pylab
import timeit

def bpowah(l1,l2):
    i = 0
    j = len(l1)
    try:
        for e in l2:
            if e == l1[i]:
                i += 1
    except IndexError: # thanks Anurag
        return True
    return False

def jed(list1, list2):
    try:
        for num in list1:
            list2 = list2[list2.index(num):]
    except: return False
    else: return True

def alex(L1,L2):  # wow!
    i2 = iter(L2)
    return all(lookfor in i2 for lookfor in L1)

from itertools import dropwhile
from operator import ne
from functools import partial

def thc4k_andrea(l1, l2):
    it = iter(l2)
    try:
        for e in l1:
            dropwhile(partial(ne, e), it).next()
        return True
    except StopIteration:
        return False


ct = 100
ss = range(10,1000,100)
nms = 'bpowah alex jed thc4k_andrea'.split()
ls = dict.fromkeys(nms)
for nm in nms:
    ls[nm] = []

setup = 'import test_sublist as x'
for s in ss:
    l1 = [random.randint(1,10) for i in range(10)]
    l2 = [random.randint(1,10) for i in range(s)]
    for nm in nms:
        stmt = 'x.'+nm+'(%s,%s)'%(str(l1),str(l2))
        t = timeit.Timer(setup=setup, stmt=stmt).timeit(ct)
        ls[nm].append( t )

pylab.clf()
for nm in nms:
    print len(ss), len(ls[nm])
    pylab.plot(ss,ls[nm],label=nm)
    pylab.legend(loc=0)

    pylab.xlabel('length of l2')
    pylab.ylabel('time')

pylab.savefig('cmp_lsts.png')
os.startfile('cmp_lsts.png')

results: alt text

bpowah
i,j = 0,len(l1)does this line mean i=0 and j=len(l1)?
Al
Alex = 1.72043895721, you = 1.70730209351. Not enough of a difference to render it better, as it's a lot less readable.
Jed Smith
Yes, it does, Al
bpowah
Sorry, Jed. I'm a speed freak. Plus, iterating over l2 wins with significant margin if the match to l1 is found in early in l2.
bpowah
+1 for looping once
Anurag Uniyal
@bpowah, may be you will gain a bit more if instead of i==j check, you catch IndexError exception
Anurag Uniyal
Nice graphics :)
Geo
"Simple is better than complex, good work alex! And nice test code bpowah|
Andrea Ambu
A: 

I have a hard time seeing questions like this and not wishing that Python's list handling was more like Haskell's. This seems a much more straightforward solution than anything I could come up with in Python:

contains_inorder :: Eq a => [a] -> [a] -> Bool
contains_inorder [] _ = True
contains_inorder _ [] = False
contains_inorder (x:xs) (y:ys) | x == y    = contains_inorder xs ys
                               | otherwise = contains_inorder (x:xs) ys
Corey Porter
If Haskell were used for pseudocode, we'd all be doomed :)
Geo
I don't really see the point of this answer.
Ionuț G. Stan
More of an aside than an answer, really.
Corey Porter
+1  A: 

This should be easy to understand and avoid corner case nicely as you don't need to work with indexes:

def compare(l1, l2):
    it = iter(l2)
    for e in l1:
        try:
            while it.next() != e: pass
        except StopIteration: return False
    return True

it tries to compare each element of l1 to the next element in l2.
if there is no next element (StopIteration) it returns false (it visited the whole l2 and didn't find the current e) else it found it, so it returns true.

For faster execution it may help to put the try block outside the for:

def compare(l1, l2):
    it = iter(l2)
    try: 
        for e in l1:
            while it.next() != e: pass
    except StopIteration: return False
    return True
Andrea Ambu
+1 I wanted to write the same thing and your `while it.next() != e: pass` is pretty nice.
THC4k
A: 

The ultra-optimized version of Andrea's solution:

from itertools import dropwhile
from operator import ne
from functools import partial

def compare(l1, l2):
    it = iter(l2)
    try:
        for e in l1:
            dropwhile(partial(ne, e), it).next()
        return True
    except StopIteration:
        return False

This can be written even more functional style:

def compare(l1,l2):
    it = iter(l2)
    # any( True for .. ) because any([0]) is False, which we don't want here
    return all( any(True for _ in dropwhile(partial(ne, e), it)) for e in l1 )
THC4k
I'm not sure about the reason, but using bpowah's test code the curve of the function that doesn't use itertools (the one I posted) is 99% of the time under the curve of the code using dropwhile. I don't know what partial and dropwhile do, I'm going to google for it.
Andrea Ambu
`partial(ne, e)` is basically `lambda x: x != e` and `dropwhile( pred, seq)` is your `while pred(seq.next()): pass`
THC4k