views:

381

answers:

5

Hey everyone,

I am writing a simple profanity filter in PHP. Can anyone tell my why, in the following code, the filter works (it will print [explicit]) for the $vowels array and not the $lines array which I constructing from a text file?

 function clean($str){

$handle = fopen("badwords.txt", "r");
if ($handle) {
   while (!feof($handle)) {
       $array[] = fgets($handle, 4096);
   }
   fclose($handle);
}

$vowels = array("a", "e", "i", "o", "u", "A", "E", "I", "O", "U");

$filter = "[explicit]";
$clean = str_replace($array, $filter, $str);
return $clean;
 }

When using $vowels in replace of $array, it works except for lowercase vowels which return:

 [[expl[explicit]c[explicit]t]xpl[explicit]c[explicit]t]

 instead of 

 [explicit]

Not sure why that is going on, either.

Any ideas?

Thanks!

+1  A: 

Because the output of the filter contains lower case vowels, which are also the characters you're filtering. Namely you're creating a feedback loop.

Don Neufeld
Good point! Thanks
behrk2
A: 

First off, file_get_contents is a much simpler function to read a file into a variable.

$badwords = explode("\n", file_get_contents('badwords.txt');

Second, preg_replace offers much more flexible string replacement options. - http://us3.php.net/preg_replace

foreach($badwords as $word) {
    $patterns[] = '/'.$word.'/';
}

$replacement = '[explicit]';

$output = preg_replace($patterns, $replacement, $input);
davethegr8
That's a pretty poor code example You've provided as only the last word in the badwords.txt will be replaced with the text '[explicit]'. If anything you should simply remove the foreach and do the following: $output = preg_replace($badwords, $replacement, $input);
Andy
@andy - haha, oops. It was late last night and I forgot a []. :)
davethegr8
+2  A: 

Make sure you read:

Coding Horror: Obscenity Filters: Bad Idea, or Incredibly Intercoursing Bad Idea?

before you choose to continue on the road of string replacement...

Jacco
Skimmed through it, too tired to read all of it right now. Looks very interesting though, thanks!
behrk2
It basically states that you cannot succeed in filtering human language without interpreting it. Google for 'clbuttic'.
Jacco
+1  A: 

I modified Davethegr8's solution to get the following working example:

 function clean($str){

global $clean_words; 

$replacement = '[explicit]';

if(empty($clean_words)){
 $badwords = explode("\n", file_get_contents('badwords.txt'));

 $clean_words = array();

 foreach($badwords as $word) {
     $clean_words[]= '/(\b' . trim($word) . '\b)/si';
 }
}

$out = preg_replace($clean_words, $replacement, $str);
return $out;
 }
behrk2
A: 

there are web services out there now to filter profanity. For example: WebPurify

jfreger