views:

121

answers:

1

Anyone know of any sample php (ideally codeigniter) code for parsing user submitted comments. TO remove profanity and HTML tags etc?

+1  A: 

Try strip_tags to get rid of any html submitted. You can use htmlspecialchars to escape the tags if you just want to ensure that no html is displayed in the comments - as per Matchu's example, less unintended effects will happen with it than with strip_tags.

For a word filter, depending on how indepth you want to go, there are many examples on the web, from simple to complex. Here's the code from Jake Olefsky's example (the simple one linked previously):

<?
//This is totally free to use by anyone for any purpose.

// BadWordFilter
// This function does all the work. If $replace is 1 it will replace all bad words
// with the wildcard replacements.  If $replace is 0 it will not replace anything.
// In either case, it will return 1 if it found bad words or 0 otherwise.
// Be sure to fill the $bads array with the bad words you want filtered.
function BadWordFilter(&$text, $replace)
{
    //fill this array with the bad words you want to filter and their replacements
    $bads = array (
        array("butt","b***"),
        array("poop","p***"),
        array("crap","c***")
    );

    if($replace==1) {                               //we are replacing
        $remember = $text;

        for($i=0;$i<sizeof($bads);$i++) {           //go through each bad word
            $text = eregi_replace($bads[$i][0],$bads[$i][5],$text); //replace it
        }

        if($remember!=$text) return 1;              //if there are any changes, return 1

    } else {                                        //we are just checking

        for($i=0;$i<sizeof($bads);$i++) {           //go through each bad word
            if(eregi($bads[$i][0],$text)) return 1; //if we find any, return 1
        }   

    }
}

//this will replace all bad words with their replacements. $any is 1 if it found any
$any = BadWordFilter($wordsToFilter,1); 

//this will not repace any bad words. $any is 1 if it found any
$any = BadWordFilter($wordsToFilter,0); 

?>

Many more examples of this can be found easily on the web.

jball
I vote for htmlspecialchars instead, to ensure that nothing like "Hello <world>!" is mistaken for HTML.
Matchu
If Alex wants to escape them and not remove them, then I agree, htmlspecialchars is better.
jball
Depends what "remove" means :)
Matchu
That's true - the intent is ambiguous :)
jball
I want to remove any html, effectively i am trying to take whatever is entered by the user and turn it into plain text. So any HTML i want to delete completely rather than just escape.
Alex