views:

49

answers:

4

Hi,

What are good options to restrict the type of html tags a user is allowed to enter into a form field? I'd like to be able to do that client side (presumably using JavaScript), server-side in PHP if it's too heavy for the user's browser, and possibly a combo of both if appropriate.

Effectively I'd like users to be able to submit data with the same tag-set as on Stackoverflow, plus maybe the standard MathML tags. The form must accept UTF-8 text, including Asian ideograms, etc.

In the application, the user must be able to submit text-entries with basic html tags, and those entries must be able to be displayed to (potentially different) users with the html rendered correctly in a way that is safe to the users. I'm planning to use htmlspecialchars() and htmlspecialchars_decode() to protect my db server-side.

Many thanks,

JDelage

PS: I searched but couldn't find this question...

+4  A: 

If you're looking to filter input agains XSS attacks etc., consider using an existing library like HTML Purifier. I've not used it myself yet but it promises a lot and is in high regard.

HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.

Pekka
Thanks for your input. This looks a bit complicated. I have never worked with libraries before...
JDelage
@JDelage yeah, it will take some time getting into, but I'm pretty sure it is much, much easier than starting to weed out all the potential dangers by yourself.
Pekka
Fair enough. The other problem though is to explain to users what they can and cannot do. I don't want to simply say "Your code is not allowed".
JDelage
@JDelage you could use `strip_tags()` (http://www.php.net/strip_tags) in conjuntction with the `allowable_tags` parameter to do an initial filtering (and return an error message if an illegal tag is detected) and run HTML Purifier afterwards. Not sure whether this will work for every situation, just an idea.
Pekka
A: 

You could do something like this, if you are familiar with regular expressions:

<?php

function parse($string)
{
//To stop unwanted HTML tags being used
$string = str_replace("<","&lt;",$string); //Replace all < with the HTML equiv
$string = str_replace(">","&gt;",$string); //Replace all > with the HTML equiv

$find = array(
"%\*\*\*(.+?)\*\*\*%s", //Search for ***any string here***
"%`(.+?)`%s",           //Search for `any string here`
);

$replace = array(
"<b>\\1</b>",                                          //Replace with <b>any string here</b>
"<span style=\"background-color: #DDDDDD\">\\1</span>" //Replace with <span style="background-color: #DDDDDD">any string here</span>
);

$string = preg_replace($find,$replace,$string); //Do the find and replace
return $string; //Return the output
}

echo parse("***Hello*** `There` <b>Friend</b>");
?>

Outputs:

Hello There <b>Friend</b>

Chief17
+1  A: 

I think is way easy to use strip_tags and just specify the tags you are allowing.

Ionut Staicu
+1  A: 

I had similar problem for some time. There were some $%^&*) who liked to post some comments like <script>alert('Hello');</script> or something like that. I got tired and made a small function, which helped me, to allow, only <br> or <br /> tags for normal view of message. I did it only in PHP, but I think it might help you.

function eliminateTags($msg) {
    $setBrakes = nl2br($msg);
    $decodeHTML = htmlspecialchars_decode($setBrakes);

    # Check PHP version
    if(version_compare(PHP_VERSION, '5.2') == 1) {
        $withoutTags = strip_tags($decodeHTML, "<br />");
    } else {
        $withoutTags = strip_tags($decodeHTML, "<br>");
    }
    return $withoutTags;
}
Eugene