views:

362

answers:

2

i'm using a regular expression to search for a bunch of keywords in a text.

All keywords are found but one: [DAM]Berlin. I know it contains a square bracket so i escaped it, but still, no luck. What am i doing wrong?

here is my php code.

The text to search for keywords:

$textToSearch= '<p><br>
Time ¦ emit LAb[au] <br>
<br>
[DAM]Berlin gallery<br>
<br>
Exhibition: February 21st - March 28th, 2009 <br>
<br>
Opening: Friday,  February 20th, 2009 7-9 pm <br>';

The regular expression:

$find='/(?![^<]+>)\b(generative art console|Game of Life|framework notations|framework|Floating numbers|factorial|f5x5x3|f5x5x1|eversion|A-plus|16n|\[DAM\]Berlin gallery)\b/s';

the replace Callback function:

function replaceCallback( $match )
{
      if ( is_array( $match ) )
      {
     $htmlVersion = htmlspecialchars( $match[1], ENT_COMPAT, 'UTF-8' );
     $urlVersion  = urlencode( $match[1] );
     return '<a class="tag" rel="tag-definition" title="Click to know more about ' . $htmlVersion . '" href="?tag=' . $urlVersion. '">'. $htmlVersion  . '</a>';
      }
      return $match;
}

and finally, the call:

$tagged_content = preg_replace_callback($find, 'replaceCallback',  $textToSearch);

Thank you for your help !

+2  A: 

I think it's because [ isn't a "word character", so \b[ can't match [ in the beginning of [DAM]Berlin. You probably need to change your regex to:

$find='/(?![^<]+>)(\b(?:generative art console|Game of Life|framework notations|framework|Floating numbers|factorial|f5x5x3|f5x5x1|eversion|A-plus|16n)|\[DAM\]Berlin gallery)\b/s';


Edit: From Daniel James's comment:

This might be closer to the original intent, as it will still check that '[Dam]' doesn't follow a word character:

$find='/(?![^<]+>)(?<!\w)(generative art console|Game of Life|framework notations|framework|Floating numbers|factorial|f5x5x3|f5x5x1|eversion|A-plus|16n|\[DAM\]Berlin gallery)\b/s';
Helen
i tried your suggestion however php returns a warning. Is there an alternative to the b flag?here is the warning: preg_replace_callback(): Compilation failed: unmatched parentheses at offset 166
pixeline
Remove the closing parenthese at the end \b)/s (this one)
jitter
This might be closer to the original intent, as it will still check that '[Dam]' doesn't follow a word character:$find='/(?![^<]+>)(?<!\w)(generative art console|Game of Life|framework notations|framework|Floating numbers|factorial|f5x5x3|f5x5x1|eversion|A-plus|16n|\[DAM\]Berlin gallery)\b/s';
Daniel James
Amazingly, in the final code, Daniel 's suggestion is the one that works. Helen's suggestion somwhere broke the resulting text.The fact that i'm searching for a lot of keywords might be the reason. Thanks to both of you anyway !
pixeline
@pixeline: That must be the new capturing group I added... Turned it into a non-capturing one. But Daniel's variant should be more reliable anyway.
Helen
+1  A: 

The first section of your Regex is '/(?![^<]+>)\b' so wouldn't it only match "[DAM]Berlin gallery" if the character before it was a '>'?

try:

$find='/(?![^<]+>)\b(generative art console|Game of Life|framework notations|framework|Floating numbers|factorial|f5x5x3|f5x5x1|eversion|A-plus|16n|\[DAM\]Berlin gallery)\b/sm'

That adds the m modifier to your regex so that it will ignore new lines

http://www.phpro.org/tutorials/Introduction-to-PHP-Regex.html#8

"[the m modifier] treats a string as having only a single newline character at the end, even if there are multiple new lines in our string."

0xC0DEFACE