views:

75

answers:

4

I have a PHP web applications. I do NOT want to allow users to post HTML to my site.

If I simply run strip_tags() on all data prior to saving into my database, will strip_tags() be enough to prevent XSS?

I ask because it's unclear to me from reading the documentation of strip_tags if XSS is prevented. There seems to be some bug with browser allowing <0/script> (yes, a zero) as valid HTML.

UPDATE

I realize that I can simply run htmlspecialchars on all outputted data; however, my thought is that - since I don't want to allow HTML in the first place, it's simply easier (and academically better) to clean my data once and for all, before saving in my database, then have to worry every time I output the data if the data is safe or not.

A: 

strip_tags() can help, but it's not bulletproof. Since it doesn't validate the HTML it's stripping, some clever person WILL find an HTML construct (mangled or otherwise) that gets stripped and but still results in something nasty getting through. But for now, it should handle most everything that gets thrown at it. Just don't assume that this will be true forever.

As well, if you allow any tags to pass through via the 'allowable tags' parameter, that will let through any of the javascript-specific attributes, such as onclick for those specific tags.

Marc B
+1  A: 

It should, I have never heard of that 0 trick before. But you can always do the strip_tags and then the htmlspecialchars just to be safe. Good practice would be to test this yourself on your application, as you know what type of data you can try and input and test and see if it breaks it. Just search for methods of XSS exploits and use that for your test data. I would check at least weekly for new vulnerabilities and continually test your script to those new exploits that come out.

Brad F Jacobs
+1 Use `htmlspecialchars()` anyway to ensure at least nothing gets *accidentally* parsed as if it were HTML.
BoltClock
Please see my UPDATE to my original post.
JimmyL
+3  A: 

As others have mentioned, you can use a combination of strip_tags and htmlspecialchars to protect yourself against XSS.

One bad thing about strip_tags is that it might remove harmless content that the user will not expect. I see techies write stuff like: <edit> foo </edit>, where they fully expect those tags to be seen as is. Also, I've seen "normal" people even do things like <g> for "grin." Again, they will think it's a bug if that doesn't show up.

So personally, I avoid strip_tags in preference for my own parser that allows me to explicitly enable certain safe HTML tags, attributes and CSS, explicitly disable unsafe tags and attributes, and convert any other special character to harmless versions. Thus the text is always seen as one would expect.

If I didn't have that parser at my disposal, I would simply use htmlspecialchars to safely encode the text.

konforce
+2  A: 

strip_tags itself is not going to be sufficient as it removes perfectly valid, non-HTML content. For instance:

<?php
 echo strip_tags("This could be a happy clown *<:) or a puckered face.\n");
 ....
 echo strip_tags("Hey guys <--- look at this!\n");

Will output:

This could be a happy clown *

And:

Hey guys

Everything after the initial < gets removed. Very annoying for end users! Disallowing reserved HTML characters would be a bad move. And these characters will need to be escaped with htmlentities or a similar function when used inline with HTML.

You need something more advanced that strip_tags - HTML Purifier works great and will allow users to use HTML reserved characters.

pygorex1