ansaurus

Question

Parsing Dreamweaver templates with Regular Expressions

Answer 1

A:

Use the HTML Agility Pack, see my answer here, http://stackoverflow.com/questions/1569917/how-do-i-parse-html-using-regular-expressions-in-c/1569970#1569970

nickyt 2009-10-20 16:44:56

Does HTML Agility support sections surrounded with special comments as per this question? I'm already trying to use agility for this but can't how to select anything other than normal nodes.

Dan Revell 2010-09-28 11:23:57

Answer 2

+1 A:

You don't want to put parentheses around .*.

This means to grab everything greedily, or not.

(.*)?

This means to grab everything lazily:

.*?

Also, in your regex, you have only one - in the ending token. Change it to this:

<!-- InstanceBeginEditable.*?-->(?<content>.*?)<!-- InstanceEnd

By the way, it's dangerous to have two .*s in a regex without an atomic group. On unexpected data, you can get catastrophic backtracking. I'd recommend changing the first .*? to [^-]*. And, while I'm at it, I'll suggest you handle whitespace more forgivingly:

<!--\s*InstanceBeginEditable[^-]*-->(?<content>.*?)<!--\s*InstanceEnd

You probably already know this, but let me add that with .NET, you'll need to use RegexOptions.Singleline.

Jeremy Stein 2009-10-20 16:46:07

Hi Jeremy, the single - in the end token was curtesy of Word, but thanks for noticing!

Greg B 2009-10-20 18:45:37

Thanks for the info on greadyness/lazyness. I had thought of using \s for white space but while I was trying to get it working I thought I'd keep it simple with a literal SPACE. Cheers

Greg B 2009-10-20 18:52:34

ansaurus

tags:

views:

answers:

Parsing Dreamweaver templates with Regular Expressions

related questions