tags:

views:

334

answers:

8
+2  Q: 

swearing in code

I have an application that uses predefined word lists but I want to extend it to give the option of using their own custom lists.

Unfortunately lists like SOWPODS (the official Scrabble word list) are quite comprehensive and contain words I wouldn't want popping up on the screen.

I can easily get hold of a banned word list and build it into my application as a kind of swear filter. Is this likely to get my app trapped by any application filtering that may be present on Google Marketplace and, if so, is there a way around it? (encryption, compression etc.)

EDIT: Most of the answers so far are missing the point that the user will be supplying the list so I have no control over its content and need to filter it in my app either on import or as it is used. (Though they will still blame me if the app "swears" at them)

A: 

I would start with a standard word list. A separate program would filter out any bad words and create your own modified standard word list.

Your application would then not need to worry about filtering anything out. No garbage in, no garbage out.

Steven
That's what I do at the moment but it won't help for a user imported list.
FixerMark
Then, I'd recommend filtering upon importing. It would be simpler than checking on every access.
Steven
Which brings me back to my original question, how do I obfuscate the banned word list I filter with so that I don't get puicked up by spam filters because my application hs every known swear word in it?
FixerMark
What about filtering would cause you to get picked up by the spam filters?
Steven
having a list of 810 offensive words in my application. Perhaps not spam but some kind of content filter may kick in and I'm wondering if anyone knows if android marketplace has anything in place or if compressing or encrypting my list is necessary to avoid the app getting flagged up as suspicious.
FixerMark
+1  A: 

Why not have a list of good-words instead of bad-words.. Much easier to find, and will make sure people can not trick your filter. I do however believe that users do not really like filters.

UnixShadow
That won't handle sentences like "I want to stick my long-necked giraffe up your fluffy white bunny." http://habitatchronicles.com/2007/03/the-untold-history-of-toontowns-speedchat-or-blockchattm-from-disney-finally-arrives/
dan04
Sure it will handle sentences, if the list only contains one word :-)
UnixShadow
+2  A: 

Many *nix distros include a word list in a plain text file /usr/share/dict/words (used for spell-check, etc.). On my OSX Leopard laptop the list appears to be stripped of the f-words. On my linux server, the f-words are there. Check your *nix distro with grep to see what you have and if it doesn't contain f-words, you could base your program on that word list.

Asaph
For real? Oh, Apple’s policies are just ridiculous and childish. This is embarrassing!
Konrad Rudolph
Have you ever misspelled that particular word? Would you really want it suggested as an autocomplete to your 8-year-old if they slightly misspelled "fork"?
Dean J
+1  A: 

I would think the banned word list would be relatively small (what, 15-20 words?). I haven't done anything like this in Java yet, but I imagine it would be simple to, when the user imports a list, put that list into a binary search tree, and then check it against the banned word list, deleting any matching entries. Then save this filtered list and use it.

Just to add to this, I would perhaps have a popup dialog, or maybe a preference that allows the user to disable filtering. Always better to give the option. :)

kcoppock
I have a list of 810 though some of them are just offensive or contain non-alpha characters.Filtering on the fly is quick enough for such a short list as I won't be generating words very quickly anyway.
FixerMark
Hmm, okay. Still, I would recommend going with a filter upon import. Better to do it once than every time, even if you aren't doing it quickly.
kcoppock
A: 

you should use webpurify to filter out profanity....

Chris Miller
+3  A: 

Is there a reason you couldn't just filter the words upon import against a "bad words" list that, according to a previous comment you made, it sounds like you already compiled?

You could also add the option into a preferences menu so that it doesn't filter them on import.

Edit: Google's policies don't allow "excessive profanity." If it is rejected, I assume you could just appeal with the argument that it is a filter against profanity and your app would be accepted.

Kirk
Thanks. I was basically trying to find out if I needed to be excessively cautious and what techniques people had used in similar situations. This answer gives me a starting point.
FixerMark
+1: for the clearer answer. I think Apple is the only one with moral filters.
Chris Lively
+1  A: 

Random thought: why not build a Bloom Filter for disallowed words, and store the bits in the filter in your program's executable instead of the word list? Sure, you might get the odd false positive, but in the space of possible strings your word list is going to filter a lot more bits.

Alternatively, if what you're really worried about is someone doing a string dump on your application, some simple obfuscation like base64 should do the trick.

Daniel Pryden
+1  A: 

I commented, but this is really more of an answer.

I think you need to learn how "spam" and "content filtering" works.

Neither of those things will prevent your app from containing or emitting any type of word. To be very clear, neither are going to search the binary of your application for those words.

That said, you can absolutely keep a list of words with your installer that you use to filter out what is displayed to the user regardless of what they upload.

BTW, "spam" filters are there to stop spam email from being received and hence block those. Content Filters work two ways. First, by letting the content providers explicitly state what audience their content is good for and second by filtering the data as it comes across. These do NOT work inside of an application; rather, they work on the data a web browser receives.

Chris Lively
OK so I was a bit lazy in my terminology but it was just quicker to write "spam filter" than "the-software-that-google-marketplace-may or-may-not-use-to-determine-if-applications-uploaded-have-inappropriate-content".
FixerMark
I've now updated my question to remove the misleading and incorrectly used word "SPAM" and clarify that I was concerned about any application "suitability testing" that may be in place in Google Marketplace.
FixerMark