views:

94

answers:

1

I writing a php function to check existence of bad whole words (keep in mind whole word not sub-strings) and also highlight whole words in given string.

function badwordChecherAndHighLighter($str,$replace){
// $replace=1  will  Highlight
// $replace=0  will  Check the existence of any badwords

$result = mysql_query("SELECT settings_badwords_en,settings_badwords_ar FROM settings_badwords WHERE settings_badwords_status=1") or die(mysql_error());

// i dont create an array, may create overhead, so i directly apply in preg_replace

if($replace==1){
while($row = mysql_fetch_row($result))
{
//$str=preg_replace('/'.$row[0].'/i', str_repeat("*",strlen($row[0])), $str);
$str=preg_replace('/\b('.$row[0].'\b)/i',"" .$row[0] . "" , $str);
$str=preg_replace('/\b('.$row[1].'\b)/i',"" .$row[1] . "" , $str);
}
return $str;
}else{

while($row = mysql_fetch_row($result))
{
 if(preg_match('/\b('.$row[0].'\b)/i',$str)) return 1;
 if(preg_match('/\b('.$row[1].'\b)/i',$str)) return 1;
}
return 0; 
}
}

// $row[1] conatin Arabic bad Words, and $row[0] contain English bad words.

This function gives correct results on Windows OS, WAMP5 1.7.3 for both Arabic and English.

But on Web Server It only works for English words, and not for Arabic.

So if Arabic text is given to this function , it is unable to check existence of any badword, and also unable to highlight arabic word.

I searched and try many options including \u but no error, no success.

So please help.

+1  A: 

The \b is not compatible the utf8 characters. Try this:

preg_match('/(?<=^|[^\p{L}])' . preg_quote($utf8word,'/') . '(?=[^\p{L}]|$)/ui',$utf8string);
turbod
`(?<!\p{L}})` and `(?!\p{L}})` are probably better.
Gumbo
@Gumbo: Your look-around for Unicode letters is almost correct - it does not account for start-of-string and end-of-string conditions.
turbod
@turbod: Sure. If the current position is the start or end of the string, then there is no Unicode letter before or after the current position respectively.
Gumbo
First of all thanks to all of you for your responses.I apply preg_match('/(?<=^|[^\p{L}])' . preg_quote($row[1],'/') . '(?=[^\p{L}]|$)/ui',$str);// Here $row[1] contain arabic text but i still found.Warning: preg_replace() [function.preg-replace]: Compilation failed: invalid UTF-8 string at offset 16 in /home/sitename/public_html/testpage.php on line 31
Asad kamran
I run this command on server to check PCRE settingwhich are as follow:root@mail [~]# pcretest -CPCRE version 6.6 06-Feb-2006Compiled withUTF-8 supportNo Unicode properties supportNewline character is LFInternal link size = 2POSIX malloc threshold = 10Default match limit = 10000000Default recursion depth limit = 10000000Match recursion uses stackAnother point is locally on WAMP, Windows OS, result is correct. And All whole wolrds in Arabic are highlighted.Thanks in advance for upcomming suggestions, solutions.
Asad kamran
this is the arabic text : رادار وكاميرا تجسس المطلوبة.ولكن لا الراداراتIN ENGLISH: Radar and spy camera required. but not radars. here i want to highlight Radar and spy. in Arabic رادار , تجسس but not the words radars in Arabic الرادارات.Any help is appreciated.
Asad kamran
@Asad kamran: Try this: `preg_match_all('/(?<=^|[^\p{L}])' . preg_quote('الرادارات','/') . '(?=[^\p{L}]|$)/ui',' رادار وكاميرا تجسس المطلوبة.ولكن لا الرادارات الرادارات',$m);print_r($m);` This work my pc correctly.
turbod
I tried this but no success, It create warning preg_replace() [function.preg-replace]: Compilation failed: invalid UTF-8 string at offsetON BOTH local and server.
Asad kamran
@Asad kamran: my pcretest -C return on my pc: `PCRE version 7.8 2008-09-05Compiled with UTF-8 support Unicode properties support Newline sequence is LF \R matches all Unicode newlines Internal link size = 2 POSIX malloc threshold = 10 Default match limit = 10000000 Default recursion depth limit = 10000000 Match recursion uses stack`probalbly your computer does not support the unicode characters.
turbod
@turbod: thank you, i donot know what was wrong, but when i try this pattern "/(?<!\pL)".$row[0]."(?!\pL)/i" it works and then i tried your first suggestion, it also works, Great, issue is resolved.Thank You!
Asad kamran