ansaurus

Question

Answer 1

+3 A:

Use a DOM parser of some sort. This is not something you want to do with regex.

Amber 2010-04-07 08:52:51

Answer 2

A:

Either use a regular expression (hard) instead of the string search replace functions or disallow words that match html tags (and attributes etc.)

zaf 2010-04-07 08:54:56

Regular expressions are not designed for parsing non-regular languages like HTML DOM and should not be used for such.

Amber 2010-04-07 09:06:21

Answer 3

A:

Simply go Do these with CSS.. yes...

Bharanikumar 2010-04-07 09:12:29

Answer 4

A:

I agree with Dav. Load $message inside a DOM parser such as simplehtmldom (manual), iterate over the text nodes and make necessary replacements.

I believe the $simplehtmldomobject->find('text'); method will be very helpful.

Salman A 2010-04-07 09:24:32

Is it right that I have to use str_replace to replace words from a text or does the html parser has a function for this?

Arjen 2010-04-07 10:52:37

Whatever html parser you use should have functions for editing or replacing DOM content. You will need to use PHP string replacement functions but with the parser you can be sure that the text you are manipulating does not contain any html tags.

Salman A 2010-04-07 13:17:28

Answer 5

A:

Thanks for the answers. But how can you return the whole text, included the html text, when replaces are finished?

Arjen 2010-04-07 10:35:37

Answer 6

A:

Now I use Simple HTML Dom parser to parse the text outside the html and replace some words by this code:

    $message= str_get_html($message);

    foreach($message->find('text') as $e) {

    foreach($keywords AS $words) {                              
    $e->outertext = str_ireplace($words,'<span class="highlighted_text">'.$words.'</span>',$e);
}

}

This is working fine, but when I use the following keywords array: array('a','b','c','d','e','f','g','h','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'); I get an php memory error. Is there a way to replace words with the help of the Simple HTML dom parser?

Arjen 2010-04-07 14:48:25

Answer 7

A:

From http://forum.phpfrance.com/vos-contributions/remplacement-selectif-hors-dans-balises-html-t199.html

function mon_rplc_callback($capture){
  global $arg;
  return ($arg['flag'] == 1)
  ? $arg['fct']($arg['from'], $arg['to'], $capture[1]).$capture[2]
  : $capture[1].$arg['fct']($arg['from'], $arg['to'], $capture[2]);
}

function split_tag($from, $to, $txt, $fct, $flag = 1){
  global $arg;
  $arg = compact('from', 'to', 'fct', 'flag');
  return preg_replace_callback('#((?:(?!<[/a-z]).)*)([^>]*>|$)#si', "mon_rplc_callback", $txt);
}

When $flag == 1, the replacement function is applied outside HTML. When $flag == -1, the replacement function is applied inside HTML.

Applied to your example, it would give something like this:

echo split_tag($words, '<span class="hightlighted_text">'.$words.'</span>', $message, 'str_ireplace', 1);

Enjoy! ;)

Savageman 2010-04-07 14:58:30

Thanks, but this is without an html parser. The above posters said that a html parser would be better in this case (??)

Arjen 2010-04-07 15:02:17

And I tell you regular expressions can do a very good job. Plus you won't need to have a valid html code: it will just work.

Savageman 2010-04-07 15:12:32

Thanks for your reply. The only problem is when de keywords (that has to be highlighted) contains all alpha numeric characters I get a memory leak error.

Arjen 2010-04-07 15:57:23

Hum. That looks strange. Can you provide the code you used?Btw I just read the argument about why regex can't parse HTML. I quite agree with the whole thing, but the problem is MUCH simpler here: we just need to know whether we are inside some HTML tag or not. We don't have to parse an HTML structure and interpreting a tree and mismatching tags or some other errors. We just know that a HTML tag is something between < and >. I believe the regex is very adapted here.

Savageman 2010-04-07 22:05:03

ansaurus

tags:

views:

answers:

Highlight text, except html tags

related questions