ansaurus

Question

Python - Most useful lists-comprehension construction

Answer 1

+12 A:

anyand all are part of standard Python from 2.5. There's no need to make your own versions of these. Also the official version of any and all short-circuit the evaluation if possible, giving a performance improvement. Your versions always iterate over the entire list.

If you want a version that accepts a predicate, use something like this that leverages the existing any and all functions:

def anyWithPredicate(predicate, l): return any(predicate(x) for x in l) 
def allWithPredicate(predicate, l): return all(predicate(x) for x in l)

I don't particularly see the need for these functions though, as it doesn't really save much typing.

Also, hiding existing standard Python functions with your own functions that have the same name but different behaviour is a bad practice.

Mark Byers 2009-11-23 15:48:23

Totally wrong. Probably I've selected confusing names. In Python, all and any do totally different things! Python's buildin all returns True if bool(x) is True for all values x in the iterable. My functions take 2 arguments - a function and a list.

psihodelia 2009-11-23 15:54:36

predicate is a .net term for closures. This is the first time I remember see it used in python. Really the anyWithPredicate is a non standard naming convention for python in general

Bryan McLemore 2009-11-23 16:00:14

From http://docs.python.org/library/itertools.html "itertools.dropwhile(predicate, iterable)"

Mark Byers 2009-11-23 16:04:34

And I agree anyWithPredicate is probably not the best name. Perhaps anyWhere would be better? Name suggestions are welcome...

Mark Byers 2009-11-23 16:08:50

every and some : I have already edited the post

psihodelia 2009-11-23 16:12:46

@Bryan McLemore: "Predicate" is not a .NET term (predicates were being used in LISP when Anders Hjelsberg was a child), and it's not a .NET term for closures either.

Robert Rossney 2009-11-23 17:44:27

are you freaking kidding me? "predicate" in it's current meaning has been in use since the 19th century.

hop 2009-11-23 18:03:49

Answer 2

+4 A:

This solution shadows builtins which is generally a bad idea. However the usage feels fairly pythonic, and it preserves the original functionality.

Note there are several ways to potentially optimize this based on testing, including, moving the imports out into the module level and changing f's default into None and testing for it instead of using a default lambda as I did.

def any(l, f=lambda x: x):
    from __builtin__ import any as _any
    return _any(f(x) for x in l)

def all(l, f=lambda x: x):
    from __builtin__ import all as _all
    return _all(f(x) for x in l)

Just putting that out there for consideration and to see what people think of doing something so potentially dirty.

Bryan McLemore 2009-11-23 16:06:35

This is how I always wish the builtins worked.

Will McCutchen 2009-11-23 16:31:59

Answer 3

+5 A:

There aren't all that many cases where a list comprehension (LC for short) will be substantially more useful than the equivalent generator expression (GE for short, i.e., using round parentheses instead of square brackets, to generate one item at a time rather than "all in bulk at the start").

Sometimes you can get a little extra speed by "investing" the extra memory to hold the list all at once, depending on vagaries of optimization and garbage collection on one or another version of Python, but that hardly amounts to substantial extra usefulness of LC vs GE.

Essentially, to get substantial extra use out of the LC as compared to the GE, you need use cases which intrinsically require "more than one pass" on the sequence. In such cases, a GE would require you to generate the sequence once per pass, while, with an LC, you can generate the sequence once, then perform multiple passes on it (paying the generation cost only once). Multiple generation may also be problematic if the GE / LC are based on an underlying iterator that's not trivially restartable (e.g., a "file" that's actually a Unix pipe).

For example, say you are reading a non-empty open text file f which has a bunch of (textual representations of) numbers separated by whitespace (including newlines here and there, empty lines, etc). You could transform it into a sequence of numbers with either a GE:

G = (float(s) for line in f for s in line.split())

or a LC:

L = [float(s) for line in f for s in line.split()]

Which one is better? Depends on what you're doing with it (i.e, the use case!). If all you want is, say, the sum, sum(G) and sum(L) will do just as well. If you want the average, sum(L)/len(L) is fine for the list, but won't work for the generator -- given the difficulty in "restarting f", to avoid an intermediate list you'll have to do something like:

tot = 0.0
for i, x in enumerate(G): tot += x
return tot/(i+1)

nowhere as snappy, fast, concise and elegant as return sum(L)/len(L).

Remember that sorted(G) does return a list (inevitably), so L.sort() (which is in-place) is the rough equivalent in this case -- sorted(L) would be supererogatory (as now you have two lists). So when sorting is needed a generator may often be preferred simply due to conciseness.

All in all, since L is identically equivalent to list(G), it's hard to get very excited about the ability to express it via punctuation (square brackets instead of round parentheses) instead of a single, short, pronounceable and obvious word like list;-). And that's all a LC is -- punctuation-based syntax shortcut for list(some_genexp)...!

Alex Martelli 2009-11-23 16:38:34

Answer 4

+1 A:

for your information, the documentation for module itertools in python 3.x list some pretty nice generator functions.

Adrien Plisson 2009-11-23 16:44:59

ansaurus

tags:

views:

answers:

Python - Most useful lists-comprehension construction

related questions