ansaurus

Question

Only match word outside a HTML statement with a regex

Answer 1

+3 A:

Don't parse HTML with regular expressions. Use XPath instead. PHP can easily make use of it.

The XPath expression for what you want is pretty straightforward. Assuming the tag that you want to search inside is a div, this is what you want:

//div/text()[contains(.,'foo')]

Once you have the text node, you can run a regular expression on it without the fear of it containing any HTML tags.

Welbog 2009-12-23 14:11:12

Good point, but in this case there will be only some links with a fixed format and no other HTML in the text. So using XPath might be overfill.

Silverscreen 2009-12-23 14:30:11

Well, using regular expressions is *impossible*. I'll take overkill over impossible any day of the week.

Welbog 2009-12-23 14:49:51

Answer 2

A:

You could count the number of opening and closing brackets that have been encountered so far. If they differ, it means that you've opened a bracket without having yet encountered the closing one, which means you're presently inside a HTML tag.

However, note that in general, using regular expressions for HTML parsing is a terrible idea.

John Feminella 2009-12-23 14:11:41

ansaurus

tags:

views:

answers:

Only match word outside a HTML statement with a regex

related questions