views:

179

answers:

5

I'm thinking of something like:

foreach (var word in paragraph.split(' ')) {
  if (badWordArray.Contains(word) {
    // do something about it
  }
}

but I'm sure there's a better way.

Thanks in advance!

UPDATE I'm not looking to remove obscenities automatically... for my web app, I want to be notified if a word I deem "bad" is used. Then I'll review it myself to make sure it's legit. An auto flagging system of sorts.

+8  A: 

While your way works, it may be a bit time consuming. There is a wonderful response here for a previous SO question. Though the question talks about PHP instead of C#, I think it can be easily ported.

Edit to add sample code:

public string FilterWords(string inputWords) {
    Regex wordFilter = new Regex("(puppies|kittens|dolphins|crabs)");
    return wordFilter.Replace(inputWords, "<3");
}

That should work for you, more or less.

Edit to answer OP clarification:

I'm not looking to remove obscenities automatically... for my web app, I want to be notified if a word I deem "bad" is used.

Much as the replacement portion above, you can see if something matches like so:

public bool HasBadWords(string inputWords) {
    Regex wordFilter = new Regex("(puppies|kittens|dolphins|crabs)");
    return wordFilter.IsMatch(inputWords);
}

It will return true if the string you passed to it contains any words in the list.

rakuo15
Usually "less".
Joel Coehoorn
If you're going to do this, **don't forget the `\b`**. It's a clbuttic mistake.
JSBangs
+1 for ass and boundary
Jim Schubert
Haha well done. The word boundary is important for sure, but if you want to filter for things like `redkittens` or `crabsapples`, this would do it.
rakuo15
Thank you, I think a combination of your answer and Detmar's is what I'll end up doing. Much appreciated.
Chad
A: 

You could consider using the HashKey objects or Dictionary<T1, T2> instead of the array as using a Dictionary for example can make code more efficient, because the .Contains() method becomes .Keys.Contains() which is way more efficient. This is especially true if you have a large list of profanities (not sure how many there are! :)

AlexW
+3  A: 

Bad Idea TM.

http://www.codinghorror.com/blog/2008/10/obscenity-filters-bad-idea-or-incredibly-intercoursing-bad-idea.html

Jaryl
not really trying to use the filter in the manner explained in the article.
Chad
+1  A: 

At my job we put some automatic bad word filtering into our software (it's kind of shocking to be browsing the source and suddenly run across the array containing several pages of obscenity).

One tip is to pre-process the user input before testing against your list, in that case that someone is trying to sneak something by you. So by way of preprocessing, we

  • uppercase everything in the input
  • remove most non-alphanumerics (that is, just splice out any spaces, or punctuation, etc.)
  • and then assuming someone is trying to pass off digits for letters, do the something like this: replace zero with O, 9 with G, 5 with S, etc. (get creative)

And then get some friends to try to break it. It's fun.

Detmar
I like this... simple and effective for my purposes. Thanks.
Chad
A: 

The best way to parse for "bad words" is not String.Contains. Certaintly not Regex or other crazy solutions.

The best way to filter "bad words" is using Natural Language Processing

+50 Bounty for the solution!

Carlos Muñoz