views:

87

answers:

2

I have a html source as a String variable. And a word as another variable that will be highlighted in that html source.

I need a Regular Expression which does not highlights tags, but obly text within the tags.

For example I have a html source like

<cfset html =  "<span>Text goes here, forr example it container also **span** </span>" />
<cfset wordToReplace = "span" />
<cfset html = ReReplace(html ,"[^(<#wordToReplace#\b[^>]*>)]","replaced","ALL")>

and what I want to get is

<span>Text goes here, forr example it container also **replaced** </span>

But I have an error. Any tip!

+1  A: 

what you have to do is use a lookahead to make sure that your text isn't contained within a tag. granted this could probably be written better, but it will get you the results you want. it will even handle when the tag has attributes.

<cfset html =  "<span class='me'>Text goes here, forr example it container also **span** </span>" />
<cfset wordToReplace = "span" />
<cfset html = ReReplace(html ,"(?!/?<)(#wordToReplace#)(?![^.*>]*>)","replaced","ALL")>
rip747
Your `(?!/?<)` is back-to-front (will match `/<`) and the `.*` inside `[^.*>]` are literal characters. But even corrected, for a trivial example of how this doesn't work... consider what `<img title="spanish > forever" src="span.png" alt="spanish flag" />` will become.
Peter Boughton
+3  A: 

I need a Regular Expression which does not highlights tags, but obly text within the tags.

You wont find one. Not one that is fully reliable against all legal/wild HTML.

The simple reason is that Regular Expressions match Regular languages, and HTML is not even remotely a Regular language.

Even if you're very careful, you run the risk of replacing stuff you didn't want to, and not replacing stuff you did want to, simply due to how complicated HTML syntax can be.


The correct way to parse HTML is using a purpose-built HTML DOM parser.

Annoyingly CF doesn't have one built in, though if your HTML is XHTML, then you can use XmlParse and XmlSearch to allow you to do an xpath search for only text (not tags) that match your text... something like //*[contains(text(), 'span')] should do (more details here).

If you've not got XHTML then you'll need to look at using a HTML DOM parser for Java - Google turns up plenty, (I've not tried any yet so can't give any specific recommendations).

Peter Boughton
+1 - the quoted part of the question amounts to, "How can I make the wrong tool for the job do the job?"
Joel Mueller