views:

17

answers:

2

I want to use a look-ahead regex for replacement with the System.Text.RegularExpression.Replace(...) method.

Now I am wondering what's wrong. Look at this example code:

        string input = "stackoverflow";
        string replacement = "#";

        var pattern1 = "(?=[a-z])o";
        var result1 = Regex.Replace(input, pattern1, replacement);
        // result1 = stack#verfl#w (as expected)

        var pattern2 = "(?=[a-k])o";
        var result2 = Regex.Replace(input, pattern2, replacement);
        // result2 = stackoverflow (expected: stack#overflow)

        var pattern3 = "(?=k)o";
        var result3 = Regex.Replace(input, pattern3, replacement);
        // result3 = stackoverflow (expected: stack#overflow)

        var pattern4 = "[a-k]";
        var result4 = Regex.Replace(input, pattern4, replacement);
        // result4 = st###ov#r#low (as expected)

        var pattern5 = "([a-k])o";
        var result5 = Regex.Replace(input, pattern5, "$1#");
        // result5 = stack#verflow" (as expected)

That is very odd. I can use [a-z] in my look ahead expression but not [a-k] or even k. What I really want is the result of the last example (pattern5). This is a woraround, but I am corious, why pattern2 or pattern3 don't return the expected. results.

+1  A: 

The problem is, (?= is matching the current input but doesn't move ahead. So (?=[a-z]) is matching the "o", not moving forward, and then you're matching the "o".

Did you want to lookbehind? Would "(?<=k)o" work for you?

Damien_The_Unbeliever
+2  A: 

I think you have the idea of look-ahead slightly muddled. When you say, in pattern 2,

(?=[a-k])o

this means "when we're at a place where the next character is something from a to k, match an o". Unsurpringly, this never matches - if we are at a place where the next character is something from a to k, it's not also going to be an o!

Pattern 3 is even starker

(?=k)o

means "when we're at a place where the next character is k, match an o". Again, no matches.

The reason pattern1 does match is because o is in a-z, so whenever the next character is o, it also meets the criterion of being 'something in a to z', and so matches.

I'm also not sure why you expected the # not to replace the o, in 2 and 3. If 5 is doing what you want, then surely you can just use it? I can't readily see how to use look-ahead to do what 5 is doing - to carry out the operation 'match (just) an o that is preceded by something from a to k', look-behind seems the obvious answer:

(?<=[a-k])o

should match just such an o (but I haven't run this myself).

AakashM
Yes, I mixed look ahead with look behind. Shame on me, my fault. Thanks for this detailed explanation.
SchlaWiener