tags:

views:

268

answers:

3

I need to use RegEx.Replace to replace only certain named groups in my input string.

So I might have a pattern like:

"^(?<NoReplace>.+)(?<FirstPeriod>(\d{2})|CM|RM|PM|CN|RN){1}(?<LastPeriod>(\d{2})|CM|RM|PM|CN|RN){1}((#(?<NumberFormat>[#,\.\+\-%0]+))*)$"

Tokens such as CM, RM are being replaced using Regex.Replace with a MatchEvaluator. However, this should only be replacing characters in the FirstPeriod and LastPeriod groups.

Example input: "FIELDCNS 01CM"

Desired output: "FIELDCNS 0104"

Incorrect output: "FIELD**04**S 0104"

Is this possible or am I best just pulling out the parts I want to replace and re-assembling afterwards?

thanks

A: 

I'm not entirely sure I understand what you're asking, but if you're wanting to replace some strings only between parts you're matching with regular expressions then the trick is to capture all the bits you don't want to replace. For example, to replace all "blah"s with "XXXXX"s but only in between a "foo" and a "bar", you could do:

Dim regex As Regex = new Regex("(foo.*)blah(.*bar)")
Console.WriteLine(regex.Replace( _
 "blah foo bar baz blah baz bar blah blah foo blah", "$1XXXXX$2"))
Console.ReadLine()

blah foo bar baz XXXXX baz bar blah blah foo blah

IRBMe
A: 

You could have something like this:

Dim evaluator as MatchEvaluator = AddressOf PeriodReplace
Regex.Replace("FIELDCNS 01CM", pattern, evaluator)

Public Function PeriodReplace(match As Match) As String
    Dim replaceTokens As New Regex("(CM|RM)")
    Dim replaceText As String = "04"
    Return match.Groups("NoReplace").Value & _
        replaceTokens.Replace(match.Groups("FirstPeriod").Value, replaceText) & _
        replaceTokens.Replace(match.Groups("LastPeriod").Value, replaceText) & _
        match.Groups("NumberFormat").Value
End Function
Ahmad Mageed
My MatchEvaluator function is actually complex enough as it is so I think this would make things more difficult!
Richard Bysouth
A: 

If you want to replace with more than one thing, you have to get more than one match. That means that your match string can only match the parts of the expression you want to replace, but you're trying to match them both at the same time. I think the missing piece here is lookbehind and lookahead.

(?<=.)(\d{2})(?=(\d{2}|CM|RM|PM|CN|RN)|(((#(?<NumberFormat>[#,\.\+\-%0]+))*)$))

This means "anything followed by two digits followed by (two digits or CM or RM...) OR (a number and the end of the input)" gets replaced. The lookahead (?=) and lookbehind (?<=) groups don't count as part of the match, so they don't get replaced.

This means that for a string like:

"FIELDCNS 01CM02CN"

You would get two calls to your MatchEvaluator, and you could get:

"FIELDCNS XXCMYYCN"

If you just want to replace all the "01" matches in the input with "04", then you don't need a MatchEvaluator at all.

Tim Sylvester
Unfortunately I don't think that will work as I do actually need to use the group <NoReplace> elsewhere in my app, so it does need to be part of the match.
Richard Bysouth