tags:

views:

97

answers:

5

If you compile a regex inside a function, and that function gets called multiple times, does Python recompile the regex each time, or does Python cache the compiled regex (assuming the regex doesn't change)?

For example:

def contains_text_of_interest(line):
    r = re.compile(r"foo\dbar\d")  
    return r.match(line)

def parse_file(fname):
    for line in open(fname):
        if contains_text_of_interest(line):
           # Do something interesting
A: 

It does the "wrong" thing, here's a longer thread on the topic.

http://stackoverflow.com/questions/146607/im-using-python-regexes-in-a-criminally-inefficient-manner

koblas
Actually, that link tells you that it does the *right* thing, but that there's a speed penalty for checking the cache.
katrielalex
+3  A: 

If you want to avoid the overhead of calling re.compile() every time, you can do:

def contains_text_of_interest(line, r = re.compile(r"foo\dbar\d")): 
    return r.match(line) 
Dingo
+1 My word, I never thought I'd see Python's default argument handling be *useful*.
katrielalex
+1  A: 

Why don't you just put the re.compile outside functions (at module or class level), give it an explicit name and just use it ? That kind of regex is a kind of constant and you can treat it the same way.

MATCH_FOO_BAR = re.compile(r"foo\dbar\d")  

def contains_text_of_interest(line):
    return MATCH_FOO_BAR.match(line)
kriss
This is what I've been doing so far, but this forces me to define the regex further away from its use than I would like.
lorin
+1  A: 

Dingo's solution is a good one [edit: Ned Batchelder's explanation is even better], but here's another one which I think is neat: use closures! If that sounds like a "big word" to you, don't worry. The concept is simple:

def make_matching_function():
    matcher = re.compile(r"foo\dbar\d")
    def f(line):
        return matcher.match(line)
    return f
contains_text_of_interest = make_matching_function()

make_matching_function is called only once, and therefore the regex is compiled only once. The function f, which is assigned to contains_text_of_interest, knows about the compiled regex matcher because it's in the surrounding scope, and will always know about it, even if you use contains_text_of_interest somewhere else (that's closures: code that takes the surrounding scope with it).

Not the most Pythonic solution to this problem, surely. But it's a good idiom to have up your sleeve, for when the time is right :)

rbp
+2  A: 

Actually, if you look at the code in the re module, the re.compile function uses the cache just as all the other functions do, so compiling the same regex over and over again is very very cheap (a dictionary lookup). In other words, write the code to be the most understandable or maintainable or expressive, and don't worry about the overhead of compiling regexes.

Ned Batchelder