ansaurus

Question

Regular expression to find text not part of a hyperlink

Answer 1

+2 A:

Don't do it! See Jeff Atwood's Parsing Html The Cthulhu Way!

David Pfeffer 2010-03-06 15:44:22

And here I thought all things Cthulhu were good and holy. How wrong I was!

Jason D 2010-03-07 08:07:05

I re-read this article and am considering its implications on my decision. Adding a new framework to deal with HTML isn't something I really wanted to add to my application, but I understand Jeff's point.

mk 2010-03-10 09:58:21

Answer 2

+1 A:

If .Net supports negative look aheads (which I think it does):

(BugID 12)(?!</a>)  // match BugID 12 if it is not followed by a closing anchor tag.

However, there is still the danger that BugID 12 will be inside an anchor like

<a href="...">Something BugID 12 Something</a>

But you can mostly overcome this with

(BugID 12)(?!(?:\s*\w*)*</a>)  // (?:\s*\w*)* matches any word characters or spaces between the string and the end tag.

Disclaimer: Parsing html with regex is not reliable and should only be done as a last resort, or in the most simple of cases. I'm sure there are plenty of instances where the above expression does not perform as desired. (example: BugID 12</span></a>)

Joel Potter 2010-03-06 15:53:47

Thank you; this has given me enough to go on.

mk 2010-03-10 09:57:31

ansaurus

tags:

views:

answers:

Regular expression to find text not part of a hyperlink

related questions