tags:

views:

188

answers:

3

Hi, I'm trying to use a Regex expression I've found in this website and it doesn't seem to work. Any ideas?

Input string:

sFetch = "123<script type=\"text/javascript\">\n\t\tfunction utmx_section(){}function utmx(){}\n\t\t(function()})();\n\t</script>456";

Regex:

sFetch = Regex.Replace(sFetch, "<script.*?>.*?</script>", "", RegexOptions.IgnoreCase);

Thanks!!!

+4  A: 

Add RegexOptions.Singleline

RegexOptions.IgnoreCase | RegexOptions.Singleline

And that will never work on follow one.

<script
>
alert(1)
</script
/**/
>

So, Find a HTML parser like HTML Agility Pack

S.Mark
Thanks.Any other recommendations about C# packages like Agility to parse HTML???
amitre
`Singleline` is the option you want; it allows `.` to match linefeeds. `Multiline` causes `$` and `^` to match before and after (respectively) linefeeds; it's irrelevant here.
Alan Moore
@Alan, You'are right! fixed!
S.Mark
+1  A: 

The reason the regex fails is that your input has newlines and the meta char . does not match it.

To solve this you can use the RegexOptions.Singleline option as S.Mark says, or you can change the regex to:

"<script[\d\D]*?>[\d\D]*?</script>"

which used [\d\D] instead of ..

\d is any digit and \D is any non-digit, so [\d\D] is a digit or a non-digit which is effectively any char.

codaddict
Thanks. Is this a solution also for nested script tags?
amitre
A: 

This is a bit shorter:

 "<script[^<]*</script>"

or

"<[^>]*>[^>]*>"
instcode
Thanks. Is this a solution also for nested script tags?
amitre
Yes, absolutely because scripts are never nested.
instcode