views:

153

answers:

6

I am currently in a project with php frontend. We're pretty concerned about security, because we'll have quite a lot of users and are an attractive target for hackers. Our users are able to submit html formatted content that is visible to other users later. This is a big problem because we're vulnerable for the whole set of XSS attacks. We're filtering as good as we can , but the variety of attack vectors is pretty big.

So, I'm searching for php based html sanitizing/filtering solutions. Commercial solutions are fine (even preferred). Currently we're using a modified html purifier, but we're not satisfied with the results.

Does anyone know good libraries/tools that are capable of filtering malicious parts of html?

Nice to have is for example html5 awareness, which will become a security nightmare once it's available "in the wild".

Update: We're doing a in depth configuration of htmlpurifier. Looks like the older framework we used before was just not configuring it at all. Now the results look much better.

+1  A: 

kses works well. You can easily specify which elements to allow and disallow, so making it ‘HTML5-aware’ would just be a matter of setting an array.

WordPress uses it, so I guess it’s pretty safe ;)

Mathias Bynens
html 5 will be a little bit more tricky, as it will introduce for example attributes on closing tags etc. And the author states on the sourceforge page that he has no time to maintain the package, which is a deal breaker :-/
Patrick Cornelissen
+1  A: 

I can really recommend kses for HTML filtering. Actually that's what wordpress uses. Its free and open source.

alexn
If kses is used with WordPress it MUST be good. ;)
Brian Lacy
But it's not actively maintained :-(
Patrick Cornelissen
+4  A: 

HTML Purifier project

Personally I have had very good results with the HTML Purifier project

It is highly customizable and has a huge code base. The only issue is uploading the files to your server.

Are you sure you have not got a configuration issue with your installation? As the purifier should not let through any HTML tags at all if configured correctly.

From the web site:

HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited,
secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.
Tired of using BBCode due to the current landscape of deficient or
insecure HTML filters? Have a
WYSIWYG editor but never been able to use it? Looking for high-quality, standards-compliant, open-source components for that application you're building? HTML Purifier is for you!

I wrote an article about how to use the HTML purifier library with CodeIgniter here.

Maybe it will help with giving it another try:

// load the config and overide defaults as necessary
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional');
$config->set('HTML', 'AllowedElements', 'a,em,blockquote,p,strong,pre,code');
$config->set('HTML', 'AllowedAttributes', 'a.href,a.title');
$config->set('HTML', 'TidyLevel', 'light'); 
Jon Winstanley
Just to note, the OP did say: "Currently we're using a **modified html purifier**, but **we're not satisfied with the results**."
Sean Vieira
It is strange that the HTML purifier is not working for the asker. I presume the config is set up wrong at some point.
Jon Winstanley
It's possible that we can get around some of the problems by further configuration, but we had some cases where our penetration guys bypassed it. Currently html purifier is our best bet, but I was hoping to find another glory shiny solution here ;-)
Patrick Cornelissen
+2  A: 

CodeIgniter has an excellent XSS filter, you could rip it out of the system/libraries/Input.php file if you wanted it as a standalone function.

fire
I'll have a look at it
Patrick Cornelissen
Hmm, I fear that this doesn't meet our non functional requirements on "foreign" code. Ripping code from another framework may be appreciated by the PM ;-)
Patrick Cornelissen
+1  A: 

you can use your current solution and add iframes with diferent base url to show the contents. Changing the base url on the iframe will disable to access from the internal javascript code to the main page. ie: if your url is: http://www.yoururl.com/thread/500 you can use in the iframe's to show content something like: http//yoururl.com/thread/500/coment/1, http//yoururl.com/thread/500/coment/2. The base url you can set can be depends on your dns/host configuration.
its not a solution to fix the problem but to jump it over, although can be usefull until you find something else.

useless
That's not really an option due to several constraints. Sorry :)
Patrick Cornelissen
+1  A: 

I've used this class before and had pretty decent success: http://www.phpclasses.org/browse/package/2189.html

Brian Lacy
Looks like it's no longer maintained. Last release was 2005 and there have been many new exploits that are most likely not covered by this.
Patrick Cornelissen