views:

406

answers:

6

Hi,

For a website that takes input from kids we need to filter any naughty / bad words that they use when they enter their comments in the website (running PHP).

The comments are a free field and users can enter whatever comments they want. The solution I can think of is to have a words list like BLACKLIST: bad,bad,word,woord,craap,craaaap, (We can fill this with all the blacklisted words).

Then when the form is saved we can look at the list and if any of the words are present then we will not allow the comment to be saved.

BUT the prolem with this method is that they can get around by adding letters to the words to make it skip the filter EG: shiiiiit

Let me know what you think is the best way to create some filter for these words.

+4  A: 

You're never going to be able to filter every permutation. Perhaps the most feasible solution is to filter the obvious, and implement a "Report Abuse" mechanism so someone can manually look over (and reject) suspect comments.

Brian Agnew
+3  A: 

SO you are going to ban shit, shït, shıt, śhit, and śhiŧ?

Blacklisting is not a viable solution in the Unicode age. Yet banning € outright seems excessive.

MSalters
A: 

Also there is always the possibility to filter word like "bass" which of course includes one of the words which is not permitted. At the moment some good moderators seem like the best solution to such a problem.

tDo
More problematic is that "ass" is only obscene in certain contexts. In other situations, it's the name of a kind of animal.
troelskn
+2  A: 

If you have enough time, it is worthwhile reading about the Scunthorpe problem.

Jeff Atwood also has a post on the futility of obscenity filters.

too much php
That was some good reading, I specially like the last one: In June 2008, a news site run by the American Family Association censored an Associated Press article on sprinter Tyson Gay, replacing instances of "gay" with homosexual, thus rendering his name as "Tyson Homosexual".
Alix Axel
A: 

Use uClassify to train bad comments, when the system is trained well enough you can flag the offending comments for moderation.

Alix Axel
+1  A: 

Thanks to too much php I've found some links which might be a solution for your case:

Alix Axel