ansaurus

Question

Python: Convert format string to regular expression

Answer 1

+1 A:

You can try this; it works around your escaping problems.

unique = '_UNIQUE_STRING_'
assert unique not in layout
regexp = re.escape(layout % {'group': unique, 'locale': unique}).replace(unique, '(.*)')

Paul Hankin 2010-04-16 17:15:54

Answer 2

+1 A:

Since you are using named placeholders, I'd use named groups. This seems to work:

import re
UNIQ='_UNIQUE_STRING_'
class MarkPlaceholders(dict):
    def __getitem__(self, key):
        return UNIQ+('(?P<%s>.*?)'%key)+UNIQ

def format_to_re(format):
    parts = (format % MarkPlaceholders()).split(UNIQ)
    for i in range(0, len(parts), 2):
        parts[i] = re.escape(parts[i])
    return ''.join(parts)

and then to test:

>>> layout = '%(group)s/foo-%(locale)s/file.txt'
>>> print format_to_re(layout)
(?P<group>.*?)\/foo\-(?P<locale>.*?)\/file\.txt
>>> pattern = re.compile(format_to_re(layout))
>>> print pattern.match('something/foo-en-gb/file.txt').groupdict()
{'locale': 'en-gb', 'group': 'something'}

Duncan 2010-04-16 17:46:56

I had hoped to find a way other than using a unique identifier, but this is an interesting spin on that approach. In particular, I like that I'll only need a single unique separator, rather than one for every field that needs to match a different regular expression.

miracle2k 2010-04-16 20:26:22

If the unique separator worries you too much you could always include a number in it and increment the number until you get something that isn't in the string.

Duncan 2010-04-17 11:54:02

ansaurus

tags:

views:

answers:

Python: Convert format string to regular expression

related questions