ansaurus

Question

Answer 1

+3 A:

You're always going to have to have a loop - someone might come along with a clever one-liner that hides the loop within a call to map() or similar, but it's always going to be there.

My preference would always be to have clean and simple code, unless performance is a major factor.

Here's perhaps a more Pythonic version of your code:

data = [['a','b'], ['a','c'], ['b','d']]
search = 'c'
for sublist in data:
    if sublist[1] == search:
        print "Found it!", sublist
        break
# Prints: Found it! ['a', 'c']

It breaks out of the loop as soon as it finds a match.

(You have a typo, by the way, in ['b''d'].)

RichieHindle 2009-07-20 21:39:56

At this point in my python "career" I favor this approach since it is very easy to read. I might come back to try the other ones for performance, though. My list gets quite big at one point. Is there any place where I can compare performance of the different approaches?

greye 2009-07-21 03:21:34

Use the `timeit` module for performance testing of this kind of thing: http://docs.python.org/library/timeit.html

RichieHindle 2009-07-21 07:18:20

Answer 2

+4 A:

>>> my_list =[ ['a', 'b'], ['a', 'c'], ['b', 'd'] ]
>>> 'd' in (x[1] for x in my_list)
True

Editing to add:

Both David's answer using any and mine using in will end when they find a match since we're using generator expressions. Here is a test using an infinite generator to show that:

def mygen():
    ''' Infinite generator '''
    while True:
        yield 'xxx'  # Just to include a non-match in the generator
        yield 'd'

print 'd' in (x for x in mygen())     # True
print any('d' == x for x in mygen())  # True
# print 'q' in (x for x in mygen())     # Never ends if uncommented
# print any('q' == x for x in mygen())  # Never ends if uncommented

I just like simply using in instead of both == and any.

Anon 2009-07-20 21:43:54

That's what it's supposed to do.

Glenn Maynard 2009-07-20 21:50:09

I think any() is clearer, but I guess it's just personal preference. +1 then...

David Zaslavsky 2009-07-20 23:56:25

Answer 3

+6 A:

Nothing against RichieHindle's and Anon's answers, but here's how I'd write it:

data = [['a','b'], ['a','c'], ['b','d']]
search = 'c'
any(e[1] == search for e in data)

Like RichieHindle said, there is a hidden loop in the implementation of any (although I think it breaks out of the loop as soon as it finds a match).

David Zaslavsky 2009-07-20 21:47:27

Answer 4

+5 A:

>>> the_list =[ ['a','b'], ['a','c'], ['b''d'] ]
>>> any('c' == x[1] for x in the_list)
True

Brandon E Taylor 2009-07-20 21:47:41

Answer 5

+1 A:

>>> the_list =[ ['a','b'], ['a','c'], ['b','d'] ]
>>> "b" in zip(*the_list)[1]
True

zip() takes a bunch of lists and groups elements together by index, effectively transposing the list-of-lists matrix. The asterisk takes the contents of the_list and sends it to zip as arguments, so you're effectively passing the three lists separately, which is what zip wants. All that remains is to check if "b" (or whatever) is in the list made up of elements with the index you're interested in.

Markus 2009-07-20 23:48:55

Answer 6

+1 A:

Markus has one way to avoid using the word for -- here's another, which should have much better performance for long the_lists...:

import itertools
found = any(itertools.ifilter(lambda x:x[1]=='b', the_list)

Alex Martelli 2009-07-21 00:01:46

Ah, good, Alex is here. ;-) Obviously, the gen exp's use the word 'for' - but if we allow that, interpreting the goal as avoiding the standard for loop structure instead of the word 'for' itself, how do all the answers given compare in terms of performance?

Anon 2009-07-21 00:18:35

@Anon, I have no time right now to run the usual -mtimeit thingies (OSCON is on, -), but from previous experience I know that itertools tend to perform like greased lightning. All answers save Markus's stop at the first match so they're all equally fast in this sense.

Alex Martelli 2009-07-21 01:20:55

NP at all. Thanks. ;-)

Anon 2009-07-21 01:30:20

Answer 7

+1 A:

Nothing wrong with using a gen exp, but if the goal is to inline the loop...

>>> import itertools, operator
>>> 'b' in itertools.imap(operator.itemgetter(1), the_list)
True

Should be the fastest as well.

Coady 2009-07-21 03:25:56

Answer 8

+2 A:

the above all look good

but do you want to keep the result?

if so...

you can use the following

result = [element for element in data if element[1] == search]

then a simple

len(result)

lets you know if anything was found (and now you can do stuff with the results)

of course this does not handle elements which are length less than one (which you should be checking unless you know they always are greater than length 1, and in that case should you be using a tuple? (tuples are immutable))

if you know all items are a set length you can also do:

any(second == search for _, second in data)

or for len(data[0]) == 4:

any(second == search for _, second, _, _ in data)

...and I would recommend using

for element in data:
   ...

instead of

for i in range(len(data)):
   ...

(for future uses, unless you want to save or use 'i', and just so you know the '0' is not required, you only need use the full syntax if you are starting at a non zero value)

Terence Honles 2009-07-21 09:25:00

ansaurus

tags:

views:

answers:

Python search in lists of lists

related questions