Use the HTML Agility Pack, see my answer here, http://stackoverflow.com/questions/1569917/how-do-i-parse-html-using-regular-expressions-in-c/1569970#1569970
nickyt
2009-10-20 16:44:56
Use the HTML Agility Pack, see my answer here, http://stackoverflow.com/questions/1569917/how-do-i-parse-html-using-regular-expressions-in-c/1569970#1569970
You don't want to put parentheses around .*
.
This means to grab everything greedily, or not.
(.*)?
This means to grab everything lazily:
.*?
Also, in your regex, you have only one -
in the ending token. Change it to this:
<!-- InstanceBeginEditable.*?-->(?<content>.*?)<!-- InstanceEnd
By the way, it's dangerous to have two .*
s in a regex without an atomic group. On unexpected data, you can get catastrophic backtracking. I'd recommend changing the first .*?
to [^-]*
. And, while I'm at it, I'll suggest you handle whitespace more forgivingly:
<!--\s*InstanceBeginEditable[^-]*-->(?<content>.*?)<!--\s*InstanceEnd
You probably already know this, but let me add that with .NET, you'll need to use RegexOptions.Singleline.