tags:

views:

71

answers:

3

I'm pulling out my hair over the following function:

Public Function SetVersion(ByVal hl7Message As String, ByVal newVersion As String) As String
    Dim rgx = New Regex("^(?<pre>.+)(\|\d\.\d{1,2})$", RegexOptions.Multiline)
    Dim m = rgx.Match(hl7Message)
    Return rgx.Replace(hl7Message, "${pre}|" & newVersion, 1, 0)
End Function

For simplicity, I'm testing against the following input:

dsfdsaf|2.1
wretdfg|2.2
sdafasd3|2.3

What I need to accomplish is replace "|2.1" in the first line with another value, say "|2.4". What is happening instead is that "|2.3" is getting replaced in the last line. It's as if I hadn't specified Multi-Line mode. Moreover, the following online tool returned correct matches. So, anyone who can see a mistake in my regex or code, please point it out. Thanks.

A: 

You can use '?' to make the '+' lazy instead of greedy. It will grab as few characters as it can while still fulfilling the regex.

Dim rgx = New Regex("^(?<pre>.+?)(\|\d\.\d{1,2})$", RegexOptions.Multiline)

If you know that the text preceeding the version number will not contain any pipes, you could also replace the . with the [^\|] character class.

Dim rgx = New Regex("^(?<pre>[^\|]+)(\|\d\.\d{1,2})$", RegexOptions.Multiline)
Andrew Rueckert
I actually tried the lazy option before, but it also didn't work. There are pipes in the <pre>. I just gave a simplified example for input. Thanks though.
Antony Highsky
+2  A: 

By specifying $ you are essentially matching the last occurrence at the end of the string. If you want to match the first occurrence, remove the $ or specify that a newline is expected:

"^(?<pre>.+)(\|\d\.\d{1,2})"

or

"^(?<pre>.+)(\|\d\.\d{1,2})[\r\n]"

Based on your comment about using Multiline and appearance of your test data I imagine your input is on multiple lines. Use the above pattern and try this:

Dim input As String = "dsfdsaf|2.1" & Environment.NewLine & _
                       "wretdfg|2.2" & Environment.NewLine & _
                       "sdafasd3|2.3"

Console.WriteLine("Before:")
Console.WriteLine(input)
Console.WriteLine("After:")
Console.WriteLine(SetVersion(input, "2.4"))

2.1 should change to 2.4.

Ahmad Mageed
Removing the $ worked. Thank you! I was under the impression that specifying multiline mode changed the meaning of $ to match end of line instead of end of string. I am used to referencing this guide: http://www.regular-expressions.info/reference.html. Is this then just an eccentricity of the .NET flavor of regex, or am I misunderstanding something?
Antony Highsky
+2  A: 

Ahmad Mageed beat me to it. Removal of the $ is required. In the following code, your 3 lines are printed out, with 2.1 been the first match.

class Program
{
    static void Main(string[] args)
    {
        string myData = "dsfdsaf|2.1" + Environment.NewLine +
                        "wretdfg|2.2" + Environment.NewLine + 
                        "sdafasd3|2.3";

        Regex rex = new Regex(@"^(?<pre>.+)(\|\d\.\d{1,2})",RegexOptions.Multiline);
        var m = rex.Matches(myData);
        foreach (var match in m)
        {
            string hello = match.ToString();
        }
    }
}
JonWillis
+1 thx. Pls see my comment to Ahmad.
Antony Highsky
@Draak, glad it worked for you. Reg Expressions can be a bit of trial and error.
JonWillis