views:

601

answers:

3

If you want to check if something matches a regex, if so, print the first group, you do..

import re
match = re.match("(\d+)g", "123g")
if match is not None:
    print match.group(1)

This is completely pedantic, but the intermediate match variable is a bit annoying..

Languages like Perl do this by creating new $1..$9 variables for match groups, like..

if($blah ~= /(\d+)g/){
    print $1
}

From this reddit comment,

with re_context.match('^blah', s) as match:
    if match:
        ...
    else:
        ...

..which I thought was an interesting idea, so I wrote a simple implementation of it:

#!/usr/bin/env python2.6
import re

class SRE_Match_Wrapper:
    def __init__(self, match):
        self.match = match

    def __exit__(self, type, value, tb):
        pass

    def __enter__(self):
        return self.match

    def __getattr__(self, name):
        if name == "__exit__":
            return self.__exit__
        elif name == "__enter__":
            return self.__name__
        else:
            return getattr(self.match, name)

def rematch(pattern, inp):
    matcher = re.compile(pattern)
    x = SRE_Match_Wrapper(matcher.match(inp))
    return x
    return match

if __name__ == '__main__':
    # Example:
    with rematch("(\d+)g", "123g") as m:
        if m:
            print(m.group(1))

    with rematch("(\d+)g", "123") as m:
        if m:
            print(m.group(1))

(This functionality could theoretically be patched into the _sre.SRE_Match object)

It would be nice if you could skip the execution of the with statement's code block, if there was no match, which would simplify this to..

with rematch("(\d+)g", "123") as m:
    print(m.group(1)) # only executed if the match occurred

..but this seems impossible based of what I can deduce from PEP 343

Any ideas? As I said, this is really trivial annoyance, almost to the point of being code-golf..

+1  A: 

I don't think using with is the solution in this case. You'd have to raise an exception in the BLOCK part (which is specified by the user) and have the __exit__ method return True to "swallow" the exception. So it would never look good.

I'd suggest going for a syntax similar to the Perl syntax. Make your own extended re module (I'll call it rex) and have it set variables in its module namespace:

if rex.match('(\d+)g', '123g'):
    print rex._1

As you can see in the comments below, this method is neither scope- nor thread-safe. You would only use this if you were completely certain that your application wouldn't become multi-threaded in the future and that any functions called from the scope that you're using this in will also use the same method.

Blixt
Be careful of this. It'll break if the code inside the conditional reuses the "rex" object, so to be safe in that case you'd need to make a copy when you use it--and that's mostly undoing the benefit. It's also not threadsafe unless you jump some TLS hoops; match() on an re object (should be) completely threadsafe. I think these issues far outweigh the benefits of doing it this way--it's a lot to have to keep in mind to save one line of code.
Glenn Maynard
and thread locals to handle multi-threading?
John Montgomery
For some cases I belive the benefits outweigh the issues. For simple single threaded programs I believe this approach is OK. However "match = re.match(...); if match: ..." is idiomatic python. I'll keep doing it that way myself. Still +1 to @Blixt's answer, for an elegant, perl-like answer to the original question.
codeape
The example given was Perl's magic variables, and they have the same limitations. I can't see any other "nice-looking" way to make the code any shorter than fetching the object and checking if it's not `None`.
Blixt
Elegant and Perl-like are precisely opposite descriptions.
Glenn Maynard
Well for the sake of argument, you could probably have the `rex.match` function keep track of which frame it was executed in and then have a `rex.group` function that would return the group appropriate to the frame it was called from. This isn't something I've tested though and I doubt it would be worth the effort.
Blixt
Have a look at this: http://paste.blixt.org/119661 It's far from perfect, but it works across different frames. I haven't worked with multi-threading in Python so I don't know if they have separate frames per thread, I'm just making a huge assumption here. Anyways, I would never recommend using this code, but it's interesting to look at nevertheless. Since the state object is discarded, multiple calls to `match()` will of course overwrite the old set of groups.
Blixt
@Blixt: threading.local would be an easier approach. And far more elegant. See http://docs.python.org/library/threading.html#threading.local
codeape
@Glenn Maynard: Agreed :-)
codeape
Well, to be honest, I don't think any of these solutions should be looked into at all. To save one line of code, only a simple solution should be considered. I don't even think it's worth to make my code thread-safe, because if you want thread-safe code, you should work with objects, and as has been proven, the only way to assign and check an object in one statement is the workaround Glenn posted using a generator, which isn't very intuitive at all.
Blixt
Obviously, there's always a balance of complexity vs. benefit, but it's not saving just one line of code; it's saving one redundant line of code in every place you use a regex, which can be a lot in some applications.
Glenn Maynard
threading.local doesn't help reusing the same regex object in multiple nested frames in the same thread. That's not a case that can be ignored--"I'm not doing that now" is something that will come back and bite you in the face a month later, when your code ends up recursing in ways you didn't originally think of.
Glenn Maynard
+4  A: 

I don't think it's trivial. I don't want to have to sprinkle a redundant conditional around my code if I'm writing code like that often.

This is slightly odd, but you can do this with an iterator:

import re

def rematch(pattern, inp):
    matcher = re.compile(pattern)
    matches = matcher.match(inp)
    if matches:
        yield matches

if __name__ == '__main__':
    for m in rematch("(\d+)g", "123g"):
        print(m.group(1))

The odd thing is that it's using an iterator for something that isn't iterating--it's closer to a conditional, and at first glance it might look like it's going to yield multiple results for each match.

It does seem odd that a context manager can't cause its managed function to be skipped entirely; while that's not explicitly one of the use cases of "with", it seems like a natural extension.

Glenn Maynard
Yeah, it would work if the `__enter__` code was executed inside the `try` part that gives the `__exit__` code control over the exceptions (because the `__enter__` code could then throw a special type of exception that is swallowed by the `with` statement, effectively stopping any code inside it from executing.) Right now, I don't see a way to get around that though.
Blixt
It'd be nice if Python allowed assignment in expressions, like C: "if x = y():", "if not (x = y():"; it'd handle this straightforwardly.
Glenn Maynard
+1: This is the only viable solution for saving one line of code. While using a generator isn't very intuitive, it does the job and is both scope- and thread-safe.
Blixt
@Glenn: Yup, that was my first instinct then I remembered Python doesn't do that `=P`
Blixt
A: 

If you're doing a lot of these in one place, here's an alternative answer:

import re
class Matcher(object):
    def __init__(self):
        self.matches = None
    def set(self, matches):
        self.matches = matches
    def __getattr__(self, name):
        return getattr(self.matches, name)

class re2(object):
    def __init__(self, expr):
        self.re = re.compile(expr)

    def match(self, matcher, s):
        matches = self.re.match(s)
        matcher.set(matches)
        return matches

pattern = re2("(\d+)g")
m = Matcher()
if pattern.match(m, "123g"):
    print(m.group(1))
if not pattern.match(m, "x123g"):
    print "no match"

You can compile the regex once with the same thread safety as re, create a single reusable Matcher object for the whole function, and then you can use it very concisely. This also has the benefit that you can reverse it in the obvious way--to do that with an iterator, you'd need to pass a flag to tell it to invert its result.

It's not much help if you're only doing a single match per function, though; you don't want to keep Matcher objects in a broader context than that; it'd cause the same issues as Blixt's solution.

Glenn Maynard