views:

38

answers:

3

Hello, I have set up an array of censored words and I want to check that a user submitted comment doesn't contain any of these words. What is the most efficient way of doing this? All I've come up with so far is splitting the string into an array of words and checking it against the array of censored words, but I've a feeling there's a neater way of doing this.

+1  A: 

I'd loop over your array of words, and use strpos to find out if the word you are considering is present in the text.

Palantir
A: 

A simple way would be using in_array() function.

Usage:

$censoredWords = array('word1', 'word2', 'word3');

$userSubmited = 'Some word1 and lorem ipsum dolor sid amet';

if ( in_array(explode(' ', $userSubmited), $censoredWords) )
{
    // do something
}

You can also use preg_match() with implode('|', $censoredWords); depending on what you're trying to achieve.

Note that any methods which try to detect censored words will probably gives false positives.

The best way to control this would be to use a flagging functions and ask help of your visitors to notify a moderator. (As it is done on SO)

Except if you're writing a complete algorithm, it'll never be efficient and will still have flaws.

Boris Guéry
A: 

The most efficient would be with an array, but to be efficient you have to censor on submission not on display (original content could be kept if another DB col if needed).

This array can be managed & retrieved from db, txt file, php code, etc.

You can go for string or regexps in this array, if you want to censor words variations it could be helpful.

For strings version you can use strtr :

$replacement = "****";
$text = strtr($text, array("fuck" => $replacement, "fuckin" => $replacement));

For regexp version, use preg_replace to pass array of forbidden expressions

$replacement = "****";
$forbidden = array('/fuck(in|er)/', '/censor(ed|ship)/');
$text = preg_replace($forbidden,  $replacement,  $text);

You can enhance replacement with callbacks in replacement to put the exact number of * chars in censored text.

Benoit