ansaurus

Question

Why does changing this regex class to .+ not provide any match?

Answer 1

+3 A:

By default "." does not match newlines, but the class \s does.

Berzemus 2010-07-12 09:06:16

Thanks, (\s)*.+(\s)* works.

ion 2010-07-12 09:08:24

For accuracy, this answer should state that `\s` matches all whitespace, not just newlines.

Peter Boughton 2010-07-12 09:26:46

Answer 2

+2 A:

You're matching a tag, so you probably want something along these lines, instead of .+:

string showPattern = @"return new_lightox\(this\);"">[^<]+</a>";

The reason that the match doesn't hit is possibly because you are missing the multiline/singleline flag and the closing tag is on the next line. In other words, this should work too:

// SingleLine option changes the dot (.) to match newlines too
MatchCollection showMatches = Regex.Matches(
                              pageSource, 
                              showPattern, 
                              RegexOptions.SingleLine);

Abel 2010-07-12 09:08:16

Answer 3

+2 A:

To let . match newline, turn on SingleLine/DOTALL mode - either using a flag in the function call (as Abel's answer shows), or using the inline modifier (?s), like this for the whole expression:

"(?s)return new_lightox\(this\);"">.+</a>"

Or for just the specific part of it:

"return new_lightox\(this\);"">(?s:.+)</a>"

It might be better to take that a step further and do this:

"return new_lightox\(this\);"">(?s:(?!</?a).+)</a>"

Which should prevent the closing </a> from belonging to a different link.

However, you need to be very wary here - it's not clear what you're doing overall, but regex is not a good tool for parsing HTML with, and can cause all sorts of problems. Look at using a HTML DOM parser instead, such as HtmlAgilityPack.

Peter Boughton 2010-07-12 09:17:48

ansaurus

tags:

views:

answers:

Why does changing this regex class to .+ not provide any match?

related questions