ansaurus

Question

Answer 1

A:

Dear sweet jebus don't do this: reason

Woot4Moo 2010-09-04 15:33:29

Answer 2

+1 A:

You can't. This problem is unsolvable with classic regular expressions, and with most of the existing regex implementations.

However, some regex engines have special support for balanced pair matching. See, e.g., here (.NET). Though even in this case your regex will be able to parse only a subset of syntactically correct texts (e.g., what if a < /div > is embedded in a comment?). You need an HTML parser to get reliable results.

atzz 2010-09-04 15:38:35

You could handle comments with a regular expression implementation that supports recursive patterns too.

Gumbo 2010-09-04 15:44:43

@Gumbo - hmm, probably... But what if source is not syntactically correct? Personally, I wouldn't be comfortable with a solution that has to explicitly take care of each possibility (what if I miss some?) I'd prefer a (maybe specialized, simplified) parser.

atzz 2010-09-04 16:10:01

Answer 3

A:

Any chance this will always be valid XHTML? If so, you'd be better off parsing it as XML than trying to regex this.

mattmc3 2010-09-04 15:40:29

Answer 4

A:

Delete the first line, delete the last line. Problem solved. No need for RegEx.

The following pattern works well with .Net RegEx implementation:

\<div class="c2"\>{[\n a-z.<>="0-9/]+}\</div\>

And we replace that with \1.

Input:

<div class="c2">
<div class="c3">
<p>...</p>
</div></div></div></div></div></div></div></div>
</div>

Output:

<div class="c3">
<p>...</p>
</div></div></div></div></div></div></div></div>

Denis Valeev 2010-09-04 15:55:52

ansaurus

tags:

views:

answers:

A regular expression question

related questions