tags:

views:

60

answers:

3

I try to test regular expression $ anchor using .net framework, the result is unexpected. The $ anchor only return the last one. I note the multiline switch is important, and I already used it here. Can anyone tell the reason. Following is my test code

Thanks Fred

        string sourceText = 
@"ab<br />
ab<br />
ab";

        //var m = Regex.Match(sourceText, "^a", RegexOptions.Multiline); //this return 3 match
        var m = Regex.Match(sourceText, "b$", RegexOptions.Multiline); //this return only one match
        while (m.Success)
        {
            Console.Write(m.Value);
            m = m.NextMatch();
        }
+4  A: 

$ matches \n only, not \r\n (as your string is when using a C# literal in a windows text file).

The regex b(?=\r?$) will do what you expect.

See http://msdn.microsoft.com/en-us/library/h5181w5w.aspx with an explanation.

Lucero
A: 

There are two reasons why this is not working. As Lucero says matching $ will only match a line feed and your test string has a carrage return as well as line feed at the end of each line. The second reason is that you are attempting to match b at the end of a line and your test string only has one line that matches this requirement, the first two lines end with >.

What I suspect you want is something more like this:

b(?=(?:<br />)?\r?$)
Martin Brown
A: 

There seems to be some coufusion about what exactly you're applying the regex to. The way it appeared in your original post, the string literal seemed to have literal newlines in it (which shouldn't even have compiled), which the SO software replaced with <BR> tags. If you want a string to contain newlines, you have to use the appropriate escape sequences, like so:

string sourceText = "ab\nab\nab";

or

string sourceText = "ab\r\nab\r\nab";

In either case, the regex b$ should match all three b's when applied in Multiline mode. I'm not set up to test it myself, but if $ really only matches before \n as that MSDN article says, I would regard it as a serious flaw in .NET regexes. Given a \r\n sequence, $ should match before the \r and not before the \n.

Alan Moore