ansaurus

Question

How can I omit words in the middle of a regular expression in Python?

Answer 1

+1 A:

reg = "Togo.*Togo.*Togo(.*)ACTIVE"

Alternatively, if you want to match the string between the last occurrence of Togo and the following occurence of ACTIVE, and the number of Togo occurences is not necessarily three, try this:

reg = "Togo(([^T]|T[^o]|To[^g]|Tog[^o])*T?.?.?)ACTIVE"

Igor ostrovsky 2009-09-17 23:36:18

Answer 2

+1 A:

This matches just the desired parts:

.*(Togo.*?)(ACTIVE.*)

The leading .* is greedy, so the following Togo matches at the last possible place. The captured part starts at the last Togo.

In your expression ^[Togo]*? doesn't do the right thing. ^ tries to match the beginning of a line and [Togo] matches any of the characters T, o or g. Even [^Togo] wouldn't work since this just matches any character that is not T, o or g.

sth 2009-09-17 23:50:37

Duh... much simpler than my attempt.

Igor ostrovsky 2009-09-18 01:56:25

In general this seems to be the best suggestion, but in my case it takes too much time. Still, I think this is the best approach if it's fast enough.

Tony 2009-09-24 11:24:36

Answer 3

+1 A:

"(Togo(?:(?!Togo).)*)(ACTIVE.*)"

The square brackets in your regex form a character class that matches one of the characters 'T', 'o', or 'g'. The caret ('^') matches the beginning of the input if it's not in a character class, and it can be used inside the square brackets to invert the character class.

In my regex, after matching the word "Togo" I match one character at a time, but only after I check that it isn't the start of another instance of "Togo". (?!Togo) is called a negative lookahead.

Alan Moore 2009-09-19 23:43:08

ansaurus

tags:

views:

answers:

How can I omit words in the middle of a regular expression in Python?

related questions