views:

313

answers:

6

Hello guys,

Am currently working on an application that requires users to submit posts and comments which is displayed on the site. As we all know that user input can't be trusted so i used htmlspecialchars($string,ENT_QUOTES) to process user's posts and comments.

Now, i want some certain html tags ignored. such as <b><br /> and a few more tags. How can i do it so that htmlspecialchars ignores some tags while it filters the others.

A: 

I would heavily recommend you use Zend_Filter for filtering through user input. Specifically, see: http://framework.zend.com/manual/en/zend.filter.html#zend.filter.introduction.using

daniel
An example of Zend_Filter_StripTags is at http://stackoverflow.com/questions/1069805/use-of-zend-framework-settagsallowed-gettagsallowed/1070052#1070052
Cal Jacobson
+3  A: 

solution a)
use strip_tags insted of htmlspecialchars, and whitelist the needed tags.
better solution b)
Use bbcodes, and give aliases to the wanted tags, e.g: [b]bold[/b]

erenon
+1  A: 

You can replace the quoted string to re-insert the allowed tags. For <b> tags for example:

$string = str_replace(array('&lt;b&gt;', &lt;/&gt;), array('<b>', '</b>'), $string);

I would only allow very distinct, complete tags to be as secure as possible. I.e. Don't use regular expressions if you don't have to, it can lead to very nasty bugs.

soulmerge
+2  A: 

It is very, very difficult to allow only some HTML tags without allowing any possibility of script injection or the like.

I would actually recommend avoiding this and using something that generates HTML such as this UBB code parser (or similar). Or even Markdown (with HTML option turned off).

That gives no scope for attackers to hit your site, which is very important if it is public-facing.

If you allow even some HTML through, chances are that a determined attacker will find a way round it.

Phill Sacre
+3  A: 
Tired of using BBCode due to the current landscape of deficient or insecure HTML filters?
--> HTML Purifier
HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, ...
VolkerK
learnt about this yesterday and a whole nightmare area of security is solved.
esryl
A: 

This isn't as simple as you might thing because neither htmlspecialchars() nor htmlentities() provides any options to ignore certain tags (both functions don't even know the meaning of the notion of tags).

You could use some other means to allow the users to format their posts, e.g. BBCode, Textile or Markdown. There are PHP parsers available for all of them.

If you'll have to stick with html-tags you could resort to some preprocessing that reformats the allowed tags so that they will not be affected by htmlspecialchars(). You can then postprocess the result to change back the format to normal HTML-tags. The following sample visualizes this process for a simple <a>-tag. Please be aware that processing HTML with regular expressions is error-prone and not always the way to go - I'll use it just for the sake of simplicity in this example.

$input = preg_replace('~<(/?\w+([^>]*?))>~', '|#$1#|', $input);
$input = htmlspecialchars($input);
$inoput = preg_replace('~|#(/?\w+(.*?))#|~', '<$1>', $input);

This is untested and will surely require a lot more work.

Stefan Gehrig