views:

261

answers:

3

I have a function to pick out lumps from a list of strings and return them as another list:

def filterPick(lines,regex):
    result = []
    for l in lines:
        match = re.search(regex,l)
        if match:
            result += [match.group(1)]
    return result

Is there a way to reformulate this as a list comprehension? Obviously it's fairly clear as is; just curious.


Thanks to those who contributed, special mention for @Alex. Here's a condensed version of what I ended up with; the regex match method is passed to filterPick as a "pre-hoisted" parameter:

import re

def filterPick(list,filter):
    return [ ( l, m.group(1) ) for l in list for m in (filter(l),) if m]

theList = ["foo", "bar", "baz", "qurx", "bother"]
searchRegex = re.compile('(a|r$)').search
x = filterPick(theList,searchRegex)

>> [('bar', 'a'), ('baz', 'a'), ('bother', 'r')]
+8  A: 
[m.group(1) for l in lines for m in [regex.search(l)] if m]

The "trick" is the for m in [regex.search(l)] part -- that's how you "assign" a value that you need to use more than once, within a list comprehension -- add just such a clause, where the object "iterates" over a single-item list containing the one value you want to "assign" to it. Some consider this stylistically dubious, but I find it practical sometimes.

Alex Martelli
Alex, I like that; thanks and +1. I have some fairly heavy lifting to do with this code - should I worry about the extra overhead of setting-up and tearing-down the "faux iterator"? BTW I subscribe to the doctrine of "optimise later".
Brent.Longborough
@Brent, the "faux iterator" should be negligible wrt the search call; one minor optimization is to use `(regex.search(l),)` in lieu of `[regex.search(l)]` (which I find more readable but is minutely slower -- I thought you couldn't possibly be in a hurry as you were actually calling the `re.search` function from the module rather than the re object's method. Pulling `regex.search` as a bound method outside of the listcomp is another minor but useful optimization, btw.
Alex Martelli
@Alex, as soon as I saw your answer I realised that using re.search was not the best way to go. Could you clarify for me how you would "[pull the] regex.search as a bound method outside of the listcomp"? I really appreciate your patience with a listcomp and Python noob.
Brent.Longborough
+1 for the 'assignment' bit. I didnt know that.
jeffjose
@Brent, `src=regex.search; lst=[m.group(1) for l in lines for m in [src(l)] if m]` is the "bound method hoisting" optimization (does the method lookup once instead of redoing it for each line -- Python doesn't hoist attribute lookups for you, when you need such optimization you can however do it manually, as I just showed).
Alex Martelli
+1  A: 
return [m.group(1) for m in (re.search(regex, l) for l in lines) if m]
Ignacio Vazquez-Abrams
A: 

It could be shortened a little

def filterPick(lines, regex):
    matches = map(re.compile(regex).match, lines)
    return [m.group(1) for m in matches if m]

You could put it all in one line, but that would mean you would have to match every line twice which would be a bit less efficient.

WoLpH
Nah, no need to match each line twice, see my answer.
Alex Martelli
Indeed, your answer is much cleaner, +1 from me :)
WoLpH