Anyone know of any sample php (ideally codeigniter) code for parsing user submitted comments. TO remove profanity and HTML tags etc?
+1
A:
Try strip_tags to get rid of any html submitted. You can use htmlspecialchars to escape the tags if you just want to ensure that no html is displayed in the comments - as per Matchu's example, less unintended effects will happen with it than with strip_tags.
For a word filter, depending on how indepth you want to go, there are many examples on the web, from simple to complex. Here's the code from Jake Olefsky's example (the simple one linked previously):
<?
//This is totally free to use by anyone for any purpose.
// BadWordFilter
// This function does all the work. If $replace is 1 it will replace all bad words
// with the wildcard replacements. If $replace is 0 it will not replace anything.
// In either case, it will return 1 if it found bad words or 0 otherwise.
// Be sure to fill the $bads array with the bad words you want filtered.
function BadWordFilter(&$text, $replace)
{
//fill this array with the bad words you want to filter and their replacements
$bads = array (
array("butt","b***"),
array("poop","p***"),
array("crap","c***")
);
if($replace==1) { //we are replacing
$remember = $text;
for($i=0;$i<sizeof($bads);$i++) { //go through each bad word
$text = eregi_replace($bads[$i][0],$bads[$i][5],$text); //replace it
}
if($remember!=$text) return 1; //if there are any changes, return 1
} else { //we are just checking
for($i=0;$i<sizeof($bads);$i++) { //go through each bad word
if(eregi($bads[$i][0],$text)) return 1; //if we find any, return 1
}
}
}
//this will replace all bad words with their replacements. $any is 1 if it found any
$any = BadWordFilter($wordsToFilter,1);
//this will not repace any bad words. $any is 1 if it found any
$any = BadWordFilter($wordsToFilter,0);
?>
Many more examples of this can be found easily on the web.
jball
2010-02-02 00:22:18
I vote for htmlspecialchars instead, to ensure that nothing like "Hello <world>!" is mistaken for HTML.
Matchu
2010-02-02 00:24:07
If Alex wants to escape them and not remove them, then I agree, htmlspecialchars is better.
jball
2010-02-02 00:26:07
Depends what "remove" means :)
Matchu
2010-02-02 00:32:36
That's true - the intent is ambiguous :)
jball
2010-02-02 00:40:39
I want to remove any html, effectively i am trying to take whatever is entered by the user and turn it into plain text. So any HTML i want to delete completely rather than just escape.
Alex
2010-02-02 01:37:05