ansaurus

Question

Parameterised regular expression in Python

Answer 1

+4 A:

Well, as you build a regexp from a string, I see no other way. But you could parameterise the string itself with a dictionary:

d = {'bar': 'a', 'foo': 'b'}
regexp = '%(foo)s|%(bar)s' % d

Or, depending on the problem, you could use list comprehensions:

vlist = ['a', 'b', 'c']
regexp = '|'.join([s for s in vlist])

EDIT: Mat clarified his question, this makes things different and the above mentioned is totally irrelevant.

I'd probably go with an approach like this:

filename = 'bob_20090216.txt'

regexps = {'bob': 'bob_[0-9]+.txt',
           'fred': 'fred_[0-9]+.txt',
           'paul': 'paul_[0-9]+.txt'}

for filetype, regexp in regexps.items():
    m = re.match(regexp, filename)
    if m != None:
        print '%s is of type %s' % (filename, filetype)

paprika 2009-02-16 23:15:59

+1 I checked the documentation to make sure, there's no way to do it (other than parametrizing the string as you say). And I don't think Python needs one.

David Zaslavsky 2009-02-16 23:18:55

@paprika: I've clarified the example to explain a little better what I'm getting at.@David: Couldn't find anything in the docs myself, but assumed it would be common enough for there to be something - perhaps that something is using strings in this manner.

Mat 2009-02-16 23:24:42

`if m:` is sufficient in this case. In general `if obj is not None` is better than `if obj != None`.

J.F. Sebastian 2009-02-16 23:44:53

@J.F. Sebastian:Indeed, 'if m: ...' would be enough. I somehow stuck with this since I learned to avoid using the brief 'if v: ...' to check for boolean truth/falseness (which is a whole different story). Could you elaborate on why 'is not' is better? Just because of readability or anything else?

paprika 2009-02-16 23:58:06

`is` checks for object identity (object address in memory) therefore It is highly efficient, but I use this form purely for readability.

J.F. Sebastian 2009-02-17 00:17:46

A quick timeit test shows no performance gain from using [0-9]+ instead of \d+ -- is there another reason not to use the shorter form?

akaihola 2009-06-24 08:08:04

Answer 2

+2 A:

import fnmatch, os

filenames = ['bob.txt', 'fred.txt', 'paul.txt']

                  # 'b.txt.b' -> 'b.txt*.b'
filepatterns = ((f, '*'.join(os.path.splitext(f))) for f in filenames) 
diskfilenames = filter(os.path.isfile, os.listdir(''))
pattern2filenames = dict((fn, fnmatch.filter(diskfilenames, pat))
                         for fn, pat in filepatterns)

print pattern2filenames

Output:

{'bob.txt': ['bob20090217.txt'], 'paul.txt': [], 'fred.txt': []}

Answers to previous revisions of your question follow:

I don't understand your updated question but filename.startswith(prefix) might be sufficient in your specific case.

After you've updated your question the old answer below is less relevant.

Use re.escape(name) if you'd like to match a name literally.
Any tool available for string parametrization is applicable here. For example:
```
import string
print string.Template("$a $b").substitute(a=1, b="B")
# 1 B
```
Or using str.format() in Python 2.6+:
```
print "{0.imag}".format(1j+2)
# 1.0
```

J.F. Sebastian 2009-02-16 23:40:00

Answer 3

+1 A:

may be glob and fnmatch modules can be of some help for you?

SilentGhost 2009-02-16 23:41:48

ansaurus

tags:

views:

answers:

Parameterised regular expression in Python

related questions