views:

27

answers:

1

Is there a way to modify the value of a backreference?

Example: In the following Text

"this is a test"

the word 'test' should be extracted and inserted into another text via backrefrence.

Regex:

(test)

Replacement:

"this is another \1"

That works fine so far. But now the question is, if it is possible to modify the backreference before inserting. Something like converting the word "test" to uppercase.

I think it could look like:

"this is another \to_upper\1"

Is there something defined in the "standard" (is there any standard at all?) of Regular Expressions?

+1  A: 

Many implementations (javascript, python etc) let you specify a function as the replace parameter. The function normally takes the whole matched string, its position in the input string, and the captured groups as arguments. The string returned by this function is used as the replacement text.

Here is how to do it using JavaScript: the replace function takes the whole matched substring as its first argument, value of captured groups as the next n arguments, followed by the index of the matched string in the original input string and the whole input string.

var s = "this is a test. and this is another one.";
console.log("replacing");
r = s.replace(/(this is) ([^.]+)/g, function(match, first, second, pos, input) {
  console.log("matched   :" + match);
  console.log("1st group :" + first);
  console.log("2nd group :" + second);
  console.log("position  :" + pos);
  console.log("input     :" + input);
  return "That is " + second.toUpperCase();
});
console.log("replaced string is");
console.log(r);

ouput:

replacing
matched   :this is a test
1st group :this is
2nd group :a test
pos       :0
input     :this is a test. and this is another one.
matched   :this is another one
1st group :this is
2nd group :another one
pos       :20
input     :this is a test. and this is another one.
replaced string is
That is A TEST. and That is ANOTHER ONE.

And here is the python version - it even gives you start/end values for each group:

#!/usr/bin/python
import re
s = "this is a test. and this is another one.";
print("replacing");

def repl(match):
    print "matched   :%s" %(match.string[match.start():match.end()])
    print "1st group :%s" %(match.group(1))
    print "2nd group :%s" %(match.group(2))
    print "position  :%d %d %d" %(match.start(), match.start(1), match.start(2))
    print "input     :%s" %(match.string)
    return "That is %s" %(match.group(2).upper())

print "replaced string is \n%s"%(re.sub(r"(this is) ([^.]+)", repl, s)) 

Output:

replacing
matched   :this is a test
1st group :this is
2nd group :a test
position  :0 0 8
input     :this is a test. and this is another one.
matched   :this is another one
1st group :this is
2nd group :another one
position  :20 20 28
input     :this is a test. and this is another one.
replaced string is 
That is A TEST. and That is ANOTHER ONE.
Amarghosh