tags:

views:

169

answers:

4

Long story short, I have two regex patterns. One pattern matches things that I want to replace, and the other pattern matches a special case of those patterns that should not be replace. For a simple example, imagine that the first one is "\{.*\}" and the second one is "\{\{.*\}\}". Then "{this}" should be replaced, but "{{this}}" should not. Is there an easy way to take a string and say "substitute all instances of the first string with "hello" so long as it is not matching the second string"?

In other words, is there a way to make a regex that is "matches the first string but not the second" easily without modifying the first string? I know that I could modify my first regex by hand to never match instances of the second, but as the first regex gets more complex, that gets very difficult.

+1  A: 

You could replace all the {} instances with your replacement string (which would include the {{}} ones), and then replace the {{}} ones with a back-reference to itself (overwriting the first replace with the original data) -- then only the {} instances would have changed.

jcoon
Could you give me an example of how to do that in python? I'm not quite sure I understand. Thanks!
Nate
I can in about an hour, but not right now, sorry...
jcoon
+2  A: 

You can give replace a function (reference)

But make sure the first regex contain the second one. This is just an example:

regex1 = re.compile('\{.*\}')
regex2 = re.compile('\{\{.*\}\}')

def replace(match):
    match = match.group(0)
    if regex2.match(match):
        return match
    return 'replacement'


regex1.sub(replace, data)
Nadia Alramli
A: 

It strikes me as sub-optimal to match against two different regexes when what you're looking for is really one pattern. To illustrate:

import re
foo = "{{this}}"
bar = "{that}"
re.match("\{[^\{].*[^\}]\}", foo)  # gives you nothing
re.match("\{[^\{].*[^\}]\}", bar)  # gives you a match object

So it's really your regex which could be a bit more precise.

JosefAssad
That's right, but Nate gave a simple imaginary example. It's not always the case.
Nadia Alramli
+4  A: 

Using negative look-ahead/behind assertion

pattern = re.compile( "(?<!\{)\{(?!\{).*?(?<!\})\}(?!\})" )
pattern.sub( "hello", input_string )

Negative look-ahead/behind assertion allows you to compare against more of the string, but is not considered as using up part of the string for the match. There is also a normal look ahead/behind assertion which will cause the string to match only if the string IS followed/preceded by the given pattern.

That's a bit confusing looking, here it is in pieces:

"(?<!\{)"  #Not preceded by a {
"\{"       #A {
"(?!\{)"   #Not followed by a {
".*?"      #Any character(s) (non-greedy)
"(?<!\})"  #Not preceded by a } (in reference to the next character)
"\}"       #A }
"(?!\})"   #Not followed by a }

So, we're looking for a { without any other {'s around it, followed by some characters, followed by a } without any other }'s around it.

By using negative look-ahead/behind assertion, we condense it down to a single regular expression which will successfully match only single {}'s anywhere in the string.

Also, note that * is a greedy operator. It will match as much as it possibly can. If you use "\{.*\}" and there is more than one {} block in the text, everything between will be taken with it.

"This is some example text {block1} more text, watch me disappear {block2} even more text"

becomes

"This is some example text hello even more text"

instead of

"This is some example text hello more text, watch me disappear hello even more text"

To get the proper output we need to make it non-greedy by appending a ?.

The python docs do a good job of presenting the re library, but the only way to really learn is to experiment.

spbogie