views:

118

answers:

2

When you use variables (is that the correct word?) in python regular expressions like this: "blah (?P\w+)" ("value" would be the variable), how could you make the variable's value be the text after "blah " to the end of the line or to a certain character not paying any attention to the actual content of the variable. For example, this is pseudo-code for what I want:

>>> import re
>>> p = re.compile("say (?P<value>continue_until_text_after_assignment_is_recognized) endsay")
>>> m = p.match("say Hello hi yo endsay")
>>> m.group('value')
'Hello hi yo'

Note: The title is probably not understandable. That is because I didn't know how to say it. Sorry if I caused any confusion.

+2  A: 

For that you'd want a regular expression of

"say (?P<value>.+) endsay"

The period matches any character, and the plus sign indicates that that should be repeated one or more times... so .+ means any sequence of one or more characters. When you put endsay at the end, the regular expression engine will make sure that whatever it matches does in fact end with that string.

David Zaslavsky
Worked perfectly! Thank you.
None
+3  A: 

You need to specify what you want to match if the text is, for example,

say hello there and endsay but some more endsay

If you want to match the whole hello there and endsay but some more substring, @David's answer is correct. Otherwise, to match just hello there and, the pattern needs to be:

say (?P<value>.+?) endsay

with a question mark after the plus sign to make it non-greedy (by default it's greedy, gobbling up all it possibly can while allowing an overall match; non-greedy means it gobbles as little as possible, again while allowing an overall match).

Alex Martelli
I thought about that but given the OP's talk about matching to the end of the line, it seemed like a greedy operator would be appropriate. Still, +1.
David Zaslavsky