views:

114

answers:

6

I already know how XSS works, but finding out all the many different ways to inject malicious input is not an option.

I saw a couple libraries out there, but most of them are very incomplete, ineficient, or GPL licensed (when will you guys learn that GPL is not good to share little libraries! Use MIT)

+8  A: 

htmlspecialchars() is the only function you should know about.

zerkms
+1 Encode your output. It really is that simple.
meagar
Unfortunately, that's not enough. If you html encode characters used in JavaScript, you'll have bad data in your JS. Same for characters placed in URLs. Also, there's use cases where the function won't prevent XSS, such as tag attributes without encapsulating single- or double-quotes (since whitespace is not encoded by htmlspecialchars)
atk
@atk: any samples?
zerkms
@zerkms: IIRC, JS requires \xx where xx is the hex code of the byte. URLs require %xx, again where xx is hex. A good JS example of badly encoded data would be alert("c=d (assuming ; isn't treated as a special char in the URL scheme - I don't remember if it is or not, off the top of my head) . True, you won't have XSS, but your functionality won't work, either.
atk
@atk: convincingly, +1
zerkms
Sure, you need the right form of encoding for your output context. That's most often `htmlspecialchars()` for HTML, but could be `rawurlencode()`, `json_encode()`, `mysql_real_escape_string()`, whatever. The main point is, this depends on the output stage and is *not* something that can be handled on the input using “anti-XSS” measures.
bobince
+2  A: 

I like htmlpurifier fine, but I see how it could be inefficient, since it's fairly large. Also, it's LGPL, and I don't know if that falls under your GPL ban.

grossvogel
+1  A: 

In addition to zerkms's answer, if you find you need to accept user submitted HTML (from a WYSIWYG editor, for example), you will need to use a HTML parser to determine what can and can't be submitted.

I use and recommend HTML Purifier.

Note: Don't even try to use regex :)

alex
+4  A: 

OWASP offers an encoding library, on which time has been spent to handle the various cases.

http://www.owasp.org/index.php/Category:OWASP_Encoding_Project

atk
That one looks great, and is MIT licensed. Perfect!
HappyDeveloper
+2  A: 

Edit: Thank you @mario for pointing that it all depends on the context. There really is no super way to prevent it all on all occasions. You have to adjust accordingly.


Edit: I stand corrected and very appreciative for both @bobince and @Rook's support on this issue. It's pretty much clear to me now that strip_tags will not prevent XSS attacks in any way.

I've scanned all my code prior to answering to see if I was in any way exposed and all is good because of the htmlentities($a, ENT_QUOTES) I've been using mainly to cope with W3C.

That said I've updated the function bellow to somewhat mimic the one I use. I still find strip_tags nice to have before htmlentities so that when a user does try to enter tags they will not pollute the final outcome. Say user entered: <b>ok!</b> it's much nicer to show it as ok! than printing out the full text htmlentities converted.

Thank you both very much for taking the time to reply and explain.


If it's coming from internet user:

// the text should not carry tags in the first place
function clean_up($text) {
    return htmlentities(strip_tags($text), ENT_QUOTES, 'UTF-8');
}

If it's coming from the backoffice... don't.

There are perfectly valid reasons why someone at the company may need javascript for this or that page. It's much better to be able to log and blame than to shut down your uers.

Frankie
`strip_tags` is not a security measure. This allows all sorts of XSS badness through, such as `<div onmouseover="alert('script injection!')">`. There's almost never a good reason to use `strip_tags`.
bobince
@bobince, you're perfectly correct. I should have revised my function before copy-pasting it. `strip_tags` is pretty efective in removing **ALL XSS** as long as you strip them all out.
Frankie
-1 because xss can still get past this. strip_tags() is garbage. The correct answer is `htmlspecialchars($var,ENT_QUOTES);`
Rook
@Frankie but you don't need tags to exploit xss. http://stackoverflow.com/questions/3762746/todays-xss-onmouseover-exploit-on-twitter-com
Rook
@Rook, @bobince I've updated the question to reflect your comments. Thank you again for taking the time to reply.
Frankie
These comments are somewhat misleading. `strip_tags` does strip *all* HTML tags out. It therefore is a valid help against raw html injection. `htmlspecialchars` **and** `urlencode` is required *in addition* if received data is to be put verbatim into tag/attribute context. But that's the crux, **it all depends on the context**. `htmlspecialchars` alone is of no help if the target context is RSS for example, because `<script>` would result in an XSS exploit over there.
mario
@mario actually browsers automatically do a htmldecode on (some?) requests. Try posting an htmlencoded quote marks and greater than and less than symbols. Also you can use `htmlspecialchars($var,ENT_QUOTES);` to stop all xss, except for *some cases* when it is already in a `<script>` tag
Rook
@Frankie yep that is the proper method for stopping xss, i gave you a +1. SO is great for learning tricky shit like this isn't it?
Rook
@Rook, what I meant in that particular case (Twitter), an urlencode would have been the better fix. Any double quote gets turned into an %22 or single quote into a %27, or angle brackets into %3C, %3E. Which way you encode input data is obviously irrelevant to browsers in most cases, if they transfuse raw data onto the next URL. That's why I think strip_tags is not useless per se. | Also I fear the original questioner now went away without knowing about `ENT_QUOTES` that you pointed out, without which htmlspecialchars isn't that useful.
mario
@mario Your right twitter was writing a url to the page. Also i think your right about the OP, oah well. He was severally misinformed because he was looking for a "library" to do this, talk about overkill.
Rook
@Rook SO is just amazing. The way we can interact, explore, share and "suck less"... is just close to perfection. Thank you once more!
Frankie
+1  A: 

HTMLPurifier is the undenied best option for cleansing HTML input, and htmlspecialchars should be applied to anything else.

But XSS vulnerabilities should not be cleaned out, because any such submissions are garbage anyway. Rather make your application bail and write a log entry. The best filter set to achieve XSS detection is in the mod_security core rules.

I'm using an inconspicious but quite thorough attribute detection here in new input(), see _xss method.

mario