views:

284

answers:

7

Hey,

I'm using the code below to highlight some keywords in a text:

$message = str_ireplace($words,'<span class="hightlighted_text">'.$words.'</span>',$message);

The text may contain some html tags, for example , etc..

How can I highlight "normal" text, except the text between the html tags? Because when users search for "img" the text will be highlighted and the image doesn't work anymore.

+3  A: 

Use a DOM parser of some sort. This is not something you want to do with regex.

Amber
A: 

Either use a regular expression (hard) instead of the string search replace functions or disallow words that match html tags (and attributes etc.)

zaf
Regular expressions are not designed for parsing non-regular languages like HTML DOM and should not be used for such.
Amber
A: 

Simply go Do these with CSS.. yes...

Bharanikumar
A: 

I agree with Dav. Load $message inside a DOM parser such as simplehtmldom (manual), iterate over the text nodes and make necessary replacements.

I believe the $simplehtmldomobject->find('text'); method will be very helpful.

Salman A
Is it right that I have to use str_replace to replace words from a text or does the html parser has a function for this?
Arjen
Whatever html parser you use should have functions for editing or replacing DOM content. You will need to use PHP string replacement functions but with the parser you can be sure that the text you are manipulating does not contain any html tags.
Salman A
A: 

Thanks for the answers. But how can you return the whole text, included the html text, when replaces are finished?

Arjen
A: 

Now I use Simple HTML Dom parser to parse the text outside the html and replace some words by this code:

    $message= str_get_html($message);

    foreach($message->find('text') as $e) {

    foreach($keywords AS $words) {                              
    $e->outertext = str_ireplace($words,'<span class="highlighted_text">'.$words.'</span>',$e);
}

}

This is working fine, but when I use the following keywords array: array('a','b','c','d','e','f','g','h','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'); I get an php memory error. Is there a way to replace words with the help of the Simple HTML dom parser?

Arjen
A: 

From http://forum.phpfrance.com/vos-contributions/remplacement-selectif-hors-dans-balises-html-t199.html

function mon_rplc_callback($capture){
  global $arg;
  return ($arg['flag'] == 1)
  ? $arg['fct']($arg['from'], $arg['to'], $capture[1]).$capture[2]
  : $capture[1].$arg['fct']($arg['from'], $arg['to'], $capture[2]);
}

function split_tag($from, $to, $txt, $fct, $flag = 1){
  global $arg;
  $arg = compact('from', 'to', 'fct', 'flag');
  return preg_replace_callback('#((?:(?!<[/a-z]).)*)([^>]*>|$)#si', "mon_rplc_callback", $txt);
}

When $flag == 1, the replacement function is applied outside HTML. When $flag == -1, the replacement function is applied inside HTML.

Applied to your example, it would give something like this:

echo split_tag($words, '<span class="hightlighted_text">'.$words.'</span>', $message, 'str_ireplace', 1);

Enjoy! ;)

Savageman
Thanks, but this is without an html parser. The above posters said that a html parser would be better in this case (??)
Arjen
And I tell you regular expressions can do a very good job. Plus you won't need to have a valid html code: it will just work.
Savageman
Thanks for your reply. The only problem is when de keywords (that has to be highlighted) contains all alpha numeric characters I get a memory leak error.
Arjen
Hum. That looks strange. Can you provide the code you used?Btw I just read the argument about why regex can't parse HTML. I quite agree with the whole thing, but the problem is MUCH simpler here: we just need to know whether we are inside some HTML tag or not. We don't have to parse an HTML structure and interpreting a tree and mismatching tags or some other errors. We just know that a HTML tag is something between < and >. I believe the regex is very adapted here.
Savageman