tags:

views:

326

answers:

4

What Python's user-made list-comprehension construction is the most useful?

I have created the following two quantifiers, which I use to do different verification operations:

def every(f, L): return not (False in [f(x) for x in L])
def some(f, L): return True in [f(x) for x in L]

an optimized versions (requres Python 2.5+) was proposed below:

def every(f, L): return all(f(x) for x in L)
def some(f, L): return any(f(x) for x in L)

So, how it works?

"""For all x in [1,4,9] there exists such y from [1,2,3] that x = y**2"""
answer = every([1,4,9], lambda x: some([1,2,3], lambda y: y**2 == x))

Using such operations, you can easily do smart verifications, like:

"""There exists at least one bot in a room which has a life below 30%"""
answer = some(bots_in_this_room, lambda x: x.life < 0.3)

and so on, you can answer even very complicated questions using such quantifiers. Of course, there is no infinite lists in Python (hey, it's not Haskell :) ), but Python's lists comprehensions are very practical.

Do you have your own favourite lists-comprehension constructions?

PS: I wonder, why most people tend not to answer questions but critisize presented examples? The question is about favourite lists-comprehension construction actually.

+12  A: 

anyand all are part of standard Python from 2.5. There's no need to make your own versions of these. Also the official version of any and all short-circuit the evaluation if possible, giving a performance improvement. Your versions always iterate over the entire list.

If you want a version that accepts a predicate, use something like this that leverages the existing any and all functions:

def anyWithPredicate(predicate, l): return any(predicate(x) for x in l) 
def allWithPredicate(predicate, l): return all(predicate(x) for x in l)

I don't particularly see the need for these functions though, as it doesn't really save much typing.

Also, hiding existing standard Python functions with your own functions that have the same name but different behaviour is a bad practice.

Mark Byers
Totally wrong. Probably I've selected confusing names. In Python, all and any do totally different things! Python's buildin all returns True if bool(x) is True for all values x in the iterable. My functions take 2 arguments - a function and a list.
psihodelia
predicate is a .net term for closures. This is the first time I remember see it used in python. Really the anyWithPredicate is a non standard naming convention for python in general
Bryan McLemore
From http://docs.python.org/library/itertools.html "itertools.dropwhile(predicate, iterable)"
Mark Byers
And I agree anyWithPredicate is probably not the best name. Perhaps anyWhere would be better? Name suggestions are welcome...
Mark Byers
every and some : I have already edited the post
psihodelia
@Bryan McLemore: "Predicate" is not a .NET term (predicates were being used in LISP when Anders Hjelsberg was a child), and it's not a .NET term for closures either.
Robert Rossney
are you freaking kidding me? "predicate" in it's current meaning has been in use since the 19th century.
hop
+4  A: 

This solution shadows builtins which is generally a bad idea. However the usage feels fairly pythonic, and it preserves the original functionality.

Note there are several ways to potentially optimize this based on testing, including, moving the imports out into the module level and changing f's default into None and testing for it instead of using a default lambda as I did.

def any(l, f=lambda x: x):
    from __builtin__ import any as _any
    return _any(f(x) for x in l)

def all(l, f=lambda x: x):
    from __builtin__ import all as _all
    return _all(f(x) for x in l)

Just putting that out there for consideration and to see what people think of doing something so potentially dirty.

Bryan McLemore
This is how I always wish the builtins worked.
Will McCutchen
+5  A: 

There aren't all that many cases where a list comprehension (LC for short) will be substantially more useful than the equivalent generator expression (GE for short, i.e., using round parentheses instead of square brackets, to generate one item at a time rather than "all in bulk at the start").

Sometimes you can get a little extra speed by "investing" the extra memory to hold the list all at once, depending on vagaries of optimization and garbage collection on one or another version of Python, but that hardly amounts to substantial extra usefulness of LC vs GE.

Essentially, to get substantial extra use out of the LC as compared to the GE, you need use cases which intrinsically require "more than one pass" on the sequence. In such cases, a GE would require you to generate the sequence once per pass, while, with an LC, you can generate the sequence once, then perform multiple passes on it (paying the generation cost only once). Multiple generation may also be problematic if the GE / LC are based on an underlying iterator that's not trivially restartable (e.g., a "file" that's actually a Unix pipe).

For example, say you are reading a non-empty open text file f which has a bunch of (textual representations of) numbers separated by whitespace (including newlines here and there, empty lines, etc). You could transform it into a sequence of numbers with either a GE:

G = (float(s) for line in f for s in line.split())

or a LC:

L = [float(s) for line in f for s in line.split()]

Which one is better? Depends on what you're doing with it (i.e, the use case!). If all you want is, say, the sum, sum(G) and sum(L) will do just as well. If you want the average, sum(L)/len(L) is fine for the list, but won't work for the generator -- given the difficulty in "restarting f", to avoid an intermediate list you'll have to do something like:

tot = 0.0
for i, x in enumerate(G): tot += x
return tot/(i+1)

nowhere as snappy, fast, concise and elegant as return sum(L)/len(L).

Remember that sorted(G) does return a list (inevitably), so L.sort() (which is in-place) is the rough equivalent in this case -- sorted(L) would be supererogatory (as now you have two lists). So when sorting is needed a generator may often be preferred simply due to conciseness.

All in all, since L is identically equivalent to list(G), it's hard to get very excited about the ability to express it via punctuation (square brackets instead of round parentheses) instead of a single, short, pronounceable and obvious word like list;-). And that's all a LC is -- punctuation-based syntax shortcut for list(some_genexp)...!

Alex Martelli
+1  A: 

for your information, the documentation for module itertools in python 3.x list some pretty nice generator functions.

Adrien Plisson