tags:

views:

77

answers:

3

If I use this

string showPattern = @"return new_lightox\(this\);"">[a-zA-Z0-9(\s),!\?\-:'&%]+</a>";
MatchCollection showMatches = Regex.Matches(pageSource, showPattern);

I get some matches but I want to get rid of [a-zA-Z0-9(\s),!\?\-:'&%]+and use any char .+ but if do this I get no match at all.

What am I doing wrong?

+3  A: 

By default "." does not match newlines, but the class \s does.

Berzemus
Thanks, (\s)*.+(\s)* works.
ion
For accuracy, this answer should state that `\s` matches all whitespace, not just newlines.
Peter Boughton
+2  A: 

You're matching a tag, so you probably want something along these lines, instead of .+:

string showPattern = @"return new_lightox\(this\);"">[^<]+</a>";

The reason that the match doesn't hit is possibly because you are missing the multiline/singleline flag and the closing tag is on the next line. In other words, this should work too:

// SingleLine option changes the dot (.) to match newlines too
MatchCollection showMatches = Regex.Matches(
                              pageSource, 
                              showPattern, 
                              RegexOptions.SingleLine);
Abel
+2  A: 

To let . match newline, turn on SingleLine/DOTALL mode - either using a flag in the function call (as Abel's answer shows), or using the inline modifier (?s), like this for the whole expression:

"(?s)return new_lightox\(this\);"">.+</a>"

Or for just the specific part of it:

"return new_lightox\(this\);"">(?s:.+)</a>"


It might be better to take that a step further and do this:

"return new_lightox\(this\);"">(?s:(?!</?a).+)</a>"

Which should prevent the closing </a> from belonging to a different link.

However, you need to be very wary here - it's not clear what you're doing overall, but regex is not a good tool for parsing HTML with, and can cause all sorts of problems. Look at using a HTML DOM parser instead, such as HtmlAgilityPack.

Peter Boughton