ansaurus

Question

Basic Profanity Filter in Objective C for iPhone

Answer 1

+5 A:

Obscenity Filters: Bad Idea, or Incredibly Intercoursing Bad Idea?

Jeff has an interesting article to consider before embarking on such a piece of code:

http://www.codinghorror.com/blog/2008/10/obscenity-filters-bad-idea-or-incredibly-intercoursing-bad-idea.html

Mark Harrison 2010-05-12 02:28:53

Agreed, obscenity filters are a terrible idea.

Michael Aaron Safyan 2010-05-12 02:34:21

I would in any normal case be all for avoiding a profanity filter, it seems like a waste of time to me also...BUT the client aka Customer has specifically requested it with an emphasis on disliking profanity, I realise 100% that my solution is a 'bandaid', but I need something to ship that at least solves the most general case.Version 2.0 will use more realistic filtering and a social reporting tool will be involved, aka 'report this user'. But until thena bandaid is what I need.

David van Dugteren 2010-05-12 02:52:56

@David, that sounds reasonable.

Mark Harrison 2010-05-12 05:50:44

Answer 2

+1 A:

Well, searching in that manner is certainly not the most efficient way to search for profanity... a more efficient approach would be to construct a finite state automaton to detect the words, and run the text once through that FSA. You don't really need to split strings to find profanity, and all that splitting adds extra allocation and copying overhead that you don't need. Also, there may be common patterns in some of the blacklisted words, which you are not exploiting by searching each word individually.

That said, I think 400 words is quite a lot. Who, exactly, is your audience? What if a user has a medical question? Should such questions actually be disallowed? I can only think of a handful of words that would be considered profane in any context, so you might want to rethink the filtering.

Michael Aaron Safyan 2010-05-12 02:30:18

You're right, the list is tentative, downloaded the list of a forum, the ultimate list will likely be cut down, but its something that will be altered as time goes on. I'm wanting to use the list as a base/starting point before pitching it to the customer who can ultimately decide what needs to be there and what doesn't. I'm looking into using a FSA regex rather than a linear while loop, I'm just not that familiar with the iPhone SDK yet, so once I suss out what the best way to do it is...I'll go for a GREP approach.

David van Dugteren 2010-05-12 02:58:43

I'm going ahead and using regexlite, that should be a little more efficient.

David van Dugteren 2010-05-12 03:43:15

Answer 3

+1 A:

I just have a suggestion for tokenizing the string. Your ways works well if the words are all separated by strings but that is rarely the case in most usage scenarios as you would normally have to deal with newlines, punctuation, etc. Try this if you are interested:

NSMutableCharacterSet *separators = [NSMutableCharacterSet punctuationCharacterSet];

[separators formUnionWithCharacterSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

NSArray *words = [bigString componentsSeparatedByCharactersInSet:separators];

Source: http://www.tech-recipes.com/rx/3418/cocoa-explode-break-nsstring-into-individual-words/

sosborn 2010-05-12 02:31:24

Thankyou so Much Sosburn! I'm now using your code! I appreciate your help there!

David van Dugteren 2010-05-12 02:54:30

@David, You should select this as the answer then, if you are using this as your solution.

Tim Jarvis 2010-05-12 03:01:26

Well its not the answer its a handy tip to compliment what I'm trying to achieve.

David van Dugteren 2010-05-12 03:23:41

Answer 4

A:

A couple of things:

FSA won't necessarily work depending on how intelligent you want the filter to be
Regex are generally extremely slow depending on how many you want to run
400 words is somewhat low, depending on your needs and langauges
There are a number of extremely tricky cases to be careful of when filtering, particularly embedding of words such as "ASSume"

My company, Inversoft, builds a commercial filtering solution and it is quite intelligent. It doesn't use regex or FSA, but has a custom built fast-linear processing technology that makes it extremely fast and accurate (4,000+ messages per second). It also has over 600 English words in a number of categories including Slang, Racial Slurs, Drug, Gang, Religious, etc.

If you are looking for an intelligent context-aware solution with support, you should check out Clean Speak from Inversoft. Hooking it up to Obj-C should be simple using the XML WebService.

Brian P 2010-05-13 15:44:19

ansaurus

tags:

views:

answers:

Basic Profanity Filter in Objective C for iPhone

related questions