We are using JCaptcha for a captcha tool in a small app that my team is writing. However, just during development time (on a small team - 4 of us), we've run across a number of curse words and other potentially offensive words for the actual captchas. Is there a way to filter out potentially offensive words so that they are not presented to the user?
I spent time downloading JCaptcha and looking at the source. Basically JCatpcha works like every single captcha out there besides ReCaptcha. Hence what you want to is trivial.
JCaptcha is using the very simple concept of a WordGenerator, which is and interface:
public interface WordGenerator {
String getWord(Integer length);
String getWord(Integer length, Locale locale);
}
Let us ignore localization.
Typical use is like this:
WordGenerator words = ...
WordToImage word2image = new SimpleWordToImage();
ImageCaptchaFactory factory = new GimpyFactory(words, word2image);
pixCaptcha = factory.getImageCaptcha();
In their unit tests we can see, for testing purpose:
WordGenerator words = new DummyWordGenerator("TESTING");
WordToImage word2image = new SimpleWordToImage();
ImageCaptchaFactory factory = new GimpyFactory(words, word2image);
pixCaptcha = factory.getImageCaptcha();
Note that we have ENTIRE control on the "WordGenerator" used.
Here's one (working, fully functional) word generator I just wrote:
private static final Random r = new Random( System.currentTimeMillis() );
public String getWord( final Integer length ) {
final StringBuilder sb = new StringBuilder();
for (int i = 0; i < length; i++) {
final int rnd = r.nextInt( 52 );
final char c = (char) (rnd < 26 ? 'a' + rnd : 'A' + (rnd-26));
sb.append( c );
}
return sb.toString();
}
It generates random "words" like these:
fqXVxId
cdVWBSZ
zXeJFaY
aeoSeEb
OuBfzvL
unYewjG
EhbzRup
GkXkTyQ
yDGnHmh
mRFgHWM
FFBkTLF
DvCHIIT
fDmjqLH
XMWSOpa
muukLLN
jUedgYK
FlbWARe
WohMMgZ
lmeLHau
djHRqlc
Note that if you prefer "real words" (like reCaptcha, but reCaptcha is using real word for another purpose altogheter -- because it helps scanning/OCRing books!) it's not an issue, simply change getWord(...) to pick randomly words out of a dictionary.
Now how do you prevent insulting words to be picked up? This is trivial. Here I just give one example (please, no arguing about the code, it's really just one example that shows how it could be done):
private static final Set<String> s = new HashSet<String>();
static {
s.add( "fuck" );
s.add( "suck" );
s.add( "dick" );
}
private static final Random r = new Random( System.currentTimeMillis() );
public String getWord( Integer length ) {
String cand = getRandomWord( length );
while ( isSwearWord(cand) ) {
cand = getRandomWord( length );
}
return cand;
}
private boolean isSwearWord( final String w ) {
return s.contains( w.toLowerCase() );
}
public String getRandomWord( final Integer length ) {
final StringBuilder sb = new StringBuilder();
for (int i = 0; i < length; i++) {
final int rnd = r.nextInt( 52 );
final char c = (char) (rnd < 26 ? 'a' + rnd : 'A' + (rnd-26));
sb.append( c );
}
return sb.toString();
}
Now if you want to prevent swear words, you probably also want to prevent those close to swear words (eg "fvck" and "dikk" etc.). This is once again trivial:
private boolean isSwearWord( final String w ) {
List<String> ls = generateAllPermutationsWithLevenhsteinEditDistanceOne(w);
for ( final String cand : ls ) {
if ( s.contains( cand.toLowerCase()) ) {
return true;
}
}
return false;
}
Writing of the method "generateAllPermutationsWithLevenhsteinEditDistanceOne(w)" is left as an exercice to the reader.