views:

124

answers:

4

I have a textarea inside a form.

Before the form is submitted, the textarea is validated and checked so it is not empty, not over 2000 characters, not contain forbidden characters etc...

I am working on the last part of the validation, which would need the textarea to be compared to an array of "bad words".

This to help me maintain and keep a good "language" on my site.

I am not very good on js, so does anybody know of a way to compare each word of the textarea to the array of bad words?

Also, would this slow down the validation much? (the array contains at most 100 words).

Thanks

+1  A: 

If you wanted to check for the presence of "expletive1" and "expletive2" you'd do the following:

my_textarea = document.getElementById('textarea_id');

if (/\b(?=\w)(expletive1|expletive2)\b(?!\w)/i.test(my_textarea.value)) {
    // we found bad words!  do something
} else {
    // no bad words found, carry on, nothing to see here
}

And you'd just add more words to the list in the same manner (expletive1|expletive2|expletive3|expletive4)

Keep in mind that to keep the words out of your app entirely you'll also need to do server-side filtering.

Aaron
Care to simplify the code so I understand it? Will this match each word in the textarea to the array?
Camran
Made some edits that hopefully clarify it a bit. It will look at the entire text area and match if it finds any of those words in the text area. Keep in mind that this is a very simple regex and would be pretty easy to circumvent (i.e.: poop is distinct from poops as far as this regex is concerned).
Aaron
Ok, so what happens if I add both "poop" and "poops" in the list? You mean it wont match "poops"? Thanks
Camran
If you add both "poop" and "poops" then it will match both of them. I was just pointing out that you would have to enter every single bad word with every possible variation, suffix, and prefix. It would be possible to write a more sophisticated regex that would match poops, pooping, pooped, etc. just by entering "poop" in your list (but then you'd be more likely to accidentally get false positives).
Aaron
It will match if you have 'poops' in there. What he means is that the regex (regular expression) would not detect plurals, but you can set it up that way. I have a feeling you do not know what they are. They can restrict input by characters, length, and even specific words. There is lots of control ... and lots of learning. Check out this site: http://www.regular-expressions.info/
Tarik
+1  A: 
var bad_words = ['stupid', 'dang']; // watered down
for (var i = 0; i <= bad_words.length; i++) {
    if (document.getElementById('my_textarea').value.match(bad_words[i])) {
        // has bad word!
    }
}

This will keep your code a bit neater, because you don't have to have 100 words in one regex match.

Jonah Bron
Wont this try to match the entire textarea sentence to every single word? I need to split the textarea to check word by word.
Camran
This is a fairly naive regex that will end up banning words like 'assorted'.
Aaron
It is cleaner and easier to manage though, so it does deserve credit too. Editing the word list inside the if statement is ugly. Camran, perhaps you can combine both Aaron and Jonah's answers.
Tarik
Actually, if you want to edit the word list inside an array instead of the regex then I like Topera's solution of building the regex from the array better. It results in only one regex being called instead of 1 for every single bad word (I haven't performance tested lots of simple regexes vs. 1 more complex regex but my gut tells me 1 regex will be significantly more performant). Topera just needs to stick word delimiters at the front and end of his regex so that he only matches whole words.
Aaron
+1  A: 

This code replaces bad words with *

// creating regex
var words = ['bad', 'words'];
var wordsStr = "";
for(var i=0; i<words.length; i++) {
    wordsStr += words[i];
    if (i < words.length -1) {
        wordsStr += "|";
    }
}
// wordsStr is "bad|words"
var regex = new RegExp(wordsStr, "gi"); // g: replace all; i:insensitive

// replacing
var text = "I cant say bad words!";
text = text.replace(regex, "****");
// text is "I cant say **** ****!"

See in jsfiddle

Topera
+1  A: 
var bad_words = new Array('word1', 'word2');
var user_words = document.getElementById('textarea').split(/\W+/);

for( var i in bad_words)
{
  if( user_words.indexOf( bad_words[i] ) != -1 )
  {
    alert( 'The textarea has bad word!');
    break;
  }
}
ovais.tariq
@ovais.tariq: +1 and welcome to SO, I believe we had bit of discussion at my blog :)
Sarfraz
Won't work if the bad word is next to punctuation.
Aaron
@sarfaraz: yeah i remember :)
ovais.tariq
@aaron: i have modified the split, now the split breaks the string on any non-word character
ovais.tariq