ansaurus

Question

How to best implement swear words handler (.NET preferred)?

Answer 1

+15 A:

Obscenity Filters: Bad Idea, or Incredibly Intercoursing Bad Idea? ^_^

Also see How do you implement a good profanity filter?.

PhiLho 2008-11-18 17:24:02

Just about to post it :) +1

Sunny 2008-11-18 17:24:40

That was my last +1 for the day... Six hours to reset. But it was worth it.

Chris Charabaruk 2008-11-18 17:35:50

Answer 2

+6 A:

The only way to win is not to play.

Consider the following sentence:

"Edward II was one of only a handful monarchs to give birth to a recorded bastard."

Bastard is a border line swear-word but in this context it is a completely sensible term.

Consider also:

"The molten slag fell out of the cruciable."
"The bitch sniffed the other dog's backside."

You are never going to be able to build a parser that is capable of working out whether the usage is correct. Even if you decided to go ahead anyway and just star out those words, they're easily subverted anyway.

Ask yourself, Is "Tw*t" really that much less offensive than "twat"? Everyone knows what word you're pointing to and everyone understands what it means.

Ultimately, the solution to this problem is not technological. Really, you want to use a human moderator of some sort to get rid of the people who swear. A human moderate has a facility that algorithms never will: it can exercise judgement. Using this judgement is far more useful than throwing computer-science at the problem.

This is discussed at length in other answer to this question.

Simon Johnson 2008-11-18 17:42:31

Er...what's the bad word in that second example? Slag? Cruciable? Molten?

Kyralessa 2008-11-18 18:44:11

@Kryalessa, it's "slag." It's a Brit thing.

Robert S. 2008-11-18 19:09:31

Answer 3

+1 A:

Well, what we (*) did is to create a two-tiered list of "bad words" (using regex to hopefully catch some variations). Using a Tier 1 word will get you a warning saying that you are violating the Terms of Service, and you cannot save that message until you fix it. If you use a Tier 2 word, the message is posted, but an objection is automatically filed against it. All message with an objection flagged (either system or user generated) are reviewed by a human who determined if it stays or goes.

(*) "We" being the e-commerce arm of a large, staid brick-and-mortar chain-store, which has just started allowing user-generated content on it's website.

James Curran 2008-11-18 18:32:08

Seems like a sensible way of doing it to me. There are some words that, under any context, are just plain "bad", and others that depend on context, as has been pointed out.

Evan 2009-02-18 03:50:04

Answer 4

+1 A:

This is why REAL programming languages have a request.getUserIntent() method.

if(request.getUserIntent() == Intent.INSULTING) {
    rejectInput();
}

Adam Jaskiewicz 2008-11-18 18:57:38

True, but only interpreted ones... ;)

ChrisA 2008-11-18 19:14:53

ansaurus

tags:

views:

answers:

How to best implement swear words handler (.NET preferred)?

related questions