tags:

views:

114

answers:

3

hello!

i find regex kinda confusing so i got stuck with this problem:

i need to insert <b> tags on certain keywords in a given text. problem is that if the keyword is within the href attribute, it would result to a broken link.

the code goes like this:

$text = preg_replace('/(\b'.$keyword.'\b)/i','<b>\1</b>',$text);

so for cases like

this <a href="keyword.php">keyword</a> here

i end up with:

this <a href="<b>keyword</b>.php"><b>keyword</b></a> here

i tried all sorts of combinations but i still couldn't get the right pattern.

thanks!

+3  A: 

You can't only use Regex to do that. They are powerful, but they can't parse recursive grammar like HTML.

Instead you should properly parse the HTML using a existing HTML parser. you just have to echo the HTML unless you encouter some text entity. In that case, you run your preg_repace on the text before echoing it.

If your HTML is valid XHTML, you can use the xml_parse function. if it's not, then use whatever HTML parser is available.

BatchyX
It is possible with regular expressions (even without using recursive patterns). But it would be a hell of a regular expression with an absolute horrible efficiency.
Gumbo
Well, prove it. Make a regex that replace a keyword in a html file only when the keyword is text, and not inside a <script> or <input> or in some attribute. xHTML is a context free grammar, regex can only recognize subtypes of that.
BatchyX
+1  A: 

You can use preg_replace again after the first replacement to remove b tags from href:

$text=preg_replace('#(href="[^"]*)<b>([^"]*)</b>#i',"$1$2",$text);
mck89
+1  A: 

Yes, you can use regex like that, but the code might become a little convulted. Here is a quick example

$string  = '<a href="keyword.php">link text with keyword and stuff</a>';
$keyword = 'keyword';
$text    = preg_replace(
               '/(<a href=")('.$keyword.')(.php">)(.*)(<\/a>)/', 
               "$1$2$3<b>$4</b>$5", 
               $string
           );

echo $string."\n";
echo $text."\n";

The content inside () are stored in variables $1,$2 ... $n, so I don't have to type stuff over again. The match can also be made more generic to match different kinds of url syntax if needed.

Seeing this solution you might want to rethink the way you plan to do matching of keywords in your code. :)

output:

<a href="keyword.php">link text with keyword and stuff</a>
<a href="keyword.php"><b>link text with keyword and stuff</b></a>
thomasmalt