ansaurus

Question

Answer 1

+2 A:

you can use string.replace() method or RegEx class

Arseny 2010-07-27 08:56:05

Answer 2

+4 A:

Although I'm a big fan of Regex, I think it won't help you here. You should fetch your bad word into a string List or string Array and use System.String.Replace on your incoming message.

Maybe better, use System.String.Split and .Join methods:

string mayContainBadWords = "... bla bla ...";
string[] badWords = new string[]{"bad", "worse", "worst"};

string[] temp = string.Split(badWords, StringSplitOptions.RemoveEmptyEntries);
string cleanString = string.Join("[Censored]", temp);

In the sample, mayContainBadWords is the string you want to check; badWords is a string array, you load from your bad word sql table and cleanString is your result.

Hinek 2010-07-27 09:00:08

which whould turn badmington into [Censored]mington

Rune FS 2010-07-27 09:44:21

exactly! :D - but seriously, this is just a sample, not a solution ... I see no approvement in using regex, here.

Hinek 2010-07-27 14:10:55

Answer 3

+11 A:

Please see this "clbuttic" (or for your case cl[Censored]ic) article before doing a string replace without considering word boundaries:

http://www.codinghorror.com/blog/2008/10/obscenity-filters-bad-idea-or-incredibly-intercoursing-bad-idea.html

Update

Obviously not foolproof (see article above - this approach is so easy to get around or produce false positives...) or optimized (the regular expressions should be cached and compiled), but the following will filter out whole words (no "clbuttics") and simple plurals of words:

const string CensoredText = "[Censored]";
const string PatternTemplate = @"\b({0})(s?)\b";
const RegexOptions Options = RegexOptions.IgnoreCase;

string[] badWords = new[] { "cranberrying", "chuffing", "ass" };

IEnumerable<Regex> badWordMatchers = badWords.
    Select(x => new Regex(string.Format(PatternTemplate, x), Options));

string input = "I've had no cranberrying sleep for chuffing chuffings days -
    the next door neighbour is playing classical music at full tilt!";

string output = badWordMatchers.
   Aggregate(input, (current, matcher) => matcher.Replace(current, CensoredText));

Console.WriteLine(output);

Gives the output:

I've had no [Censored] sleep for [Censored] [Censored] days - the next door neighbour is playing classical music at full tilt!

Note that "classical" does not become "cl[Censored]ical", as whole words are matched with the regular expression.

Update 2

And to demonstrate a flavour of how this (and in general basic string\pattern matching techniques) can be easily subverted, see the following string:

"I've had no cranberryıng sleep for chuffıng chuffıngs days - the next door neighbour is playing classical music at full tilt!"

I have replaced the "i"'s with Turkish lower case undottted "ı"'s. Still looks pretty offensive!

chibacity 2010-07-27 09:01:50

Good background article. Would have probably put it as a comment rather than an answer though as it doesn't really answer the question.

Robin Day 2010-07-27 09:10:03

@Robin I will burn in hell, but I provided an example.

chibacity 2010-07-27 09:37:14

@chibacity: +1, Whilst it's a hard/impossible problem to solve. (I've seen arcades filled with the name CLINT due to the font used!) I would still rather see a bunch of S[Censored]horpe mistakes than a kids browser filled with obscenities.

Robin Day 2010-07-27 09:45:15

@Robin The above approach would give "Scunthorpe" which I think is more appropriate. Children are naturally curious.

chibacity 2010-07-27 09:59:31

Answer 4

+1 A:

There is also a nice article about it which can e found here

With a little html-parsing skills, you can get a large list with swear words from noswear

Nealv 2010-07-27 09:51:28

ansaurus

tags:

views:

answers:

Replace Bad words using Regex

related questions