ansaurus

Question

Is there a definitive anti-XSS library for PHP?

Answer 1

+8 A:

htmlspecialchars() is the only function you should know about.

zerkms 2010-10-20 02:06:10

+1 Encode your output. It really is that simple.

meagar 2010-10-20 02:11:00

Unfortunately, that's not enough. If you html encode characters used in JavaScript, you'll have bad data in your JS. Same for characters placed in URLs. Also, there's use cases where the function won't prevent XSS, such as tag attributes without encapsulating single- or double-quotes (since whitespace is not encoded by htmlspecialchars)

atk 2010-10-20 02:18:14

@atk: any samples?

zerkms 2010-10-20 02:21:34

@zerkms: IIRC, JS requires \xx where xx is the hex code of the byte. URLs require %xx, again where xx is hex. A good JS example of badly encoded data would be alert("c=d (assuming ; isn't treated as a special char in the URL scheme - I don't remember if it is or not, off the top of my head) . True, you won't have XSS, but your functionality won't work, either.

atk 2010-10-20 02:25:33

@atk: convincingly, +1

zerkms 2010-10-20 02:28:24

Sure, you need the right form of encoding for your output context. That's most often `htmlspecialchars()` for HTML, but could be `rawurlencode()`, `json_encode()`, `mysql_real_escape_string()`, whatever. The main point is, this depends on the output stage and is *not* something that can be handled on the input using “anti-XSS” measures.

bobince 2010-10-20 03:05:32

Answer 2

+2 A:

I like htmlpurifier fine, but I see how it could be inefficient, since it's fairly large. Also, it's LGPL, and I don't know if that falls under your GPL ban.

grossvogel 2010-10-20 02:10:52

Answer 3

+1 A:

In addition to zerkms's answer, if you find you need to accept user submitted HTML (from a WYSIWYG editor, for example), you will need to use a HTML parser to determine what can and can't be submitted.

I use and recommend HTML Purifier.

Note: Don't even try to use regex :)

alex 2010-10-20 02:11:58

Answer 4

+4 A:

OWASP offers an encoding library, on which time has been spent to handle the various cases.

http://www.owasp.org/index.php/Category:OWASP_Encoding_Project

atk 2010-10-20 02:19:53

That one looks great, and is MIT licensed. Perfect!

HappyDeveloper 2010-10-20 03:21:04

Answer 5

+2 A:

Edit: Thank you @mario for pointing that it all depends on the context. There really is no super way to prevent it all on all occasions. You have to adjust accordingly.

Edit: I stand corrected and very appreciative for both @bobince and @Rook's support on this issue. It's pretty much clear to me now that strip_tags will not prevent XSS attacks in any way.

I've scanned all my code prior to answering to see if I was in any way exposed and all is good because of the htmlentities($a, ENT_QUOTES) I've been using mainly to cope with W3C.

That said I've updated the function bellow to somewhat mimic the one I use. I still find strip_tags nice to have before htmlentities so that when a user does try to enter tags they will not pollute the final outcome. Say user entered: <b>ok!</b> it's much nicer to show it as ok! than printing out the full text htmlentities converted.

Thank you both very much for taking the time to reply and explain.

If it's coming from internet user:

// the text should not carry tags in the first place
function clean_up($text) {
    return htmlentities(strip_tags($text), ENT_QUOTES, 'UTF-8');
}

If it's coming from the backoffice... don't.

There are perfectly valid reasons why someone at the company may need javascript for this or that page. It's much better to be able to log and blame than to shut down your uers.

Frankie 2010-10-20 02:42:33

`strip_tags` is not a security measure. This allows all sorts of XSS badness through, such as `<div onmouseover="alert('script injection!')">`. There's almost never a good reason to use `strip_tags`.

bobince 2010-10-20 03:03:28

@bobince, you're perfectly correct. I should have revised my function before copy-pasting it. `strip_tags` is pretty efective in removing **ALL XSS** as long as you strip them all out.

Frankie 2010-10-20 03:07:02

-1 because xss can still get past this. strip_tags() is garbage. The correct answer is `htmlspecialchars($var,ENT_QUOTES);`

Rook 2010-10-20 05:34:31

@Frankie but you don't need tags to exploit xss. http://stackoverflow.com/questions/3762746/todays-xss-onmouseover-exploit-on-twitter-com

Rook 2010-10-20 05:52:02

@Rook, @bobince I've updated the question to reflect your comments. Thank you again for taking the time to reply.

Frankie 2010-10-20 11:58:28

These comments are somewhat misleading. `strip_tags` does strip *all* HTML tags out. It therefore is a valid help against raw html injection. `htmlspecialchars` **and** `urlencode` is required *in addition* if received data is to be put verbatim into tag/attribute context. But that's the crux, **it all depends on the context**. `htmlspecialchars` alone is of no help if the target context is RSS for example, because `<script>` would result in an XSS exploit over there.

mario 2010-10-20 16:29:41

@mario actually browsers automatically do a htmldecode on (some?) requests. Try posting an htmlencoded quote marks and greater than and less than symbols. Also you can use `htmlspecialchars($var,ENT_QUOTES);` to stop all xss, except for *some cases* when it is already in a `<script>` tag

Rook 2010-10-20 18:20:25

@Frankie yep that is the proper method for stopping xss, i gave you a +1. SO is great for learning tricky shit like this isn't it?

Rook 2010-10-20 18:22:35

@Rook, what I meant in that particular case (Twitter), an urlencode would have been the better fix. Any double quote gets turned into an %22 or single quote into a %27, or angle brackets into %3C, %3E. Which way you encode input data is obviously irrelevant to browsers in most cases, if they transfuse raw data onto the next URL. That's why I think strip_tags is not useless per se. | Also I fear the original questioner now went away without knowing about `ENT_QUOTES` that you pointed out, without which htmlspecialchars isn't that useful.

mario 2010-10-20 18:36:33

@mario Your right twitter was writing a url to the page. Also i think your right about the OP, oah well. He was severally misinformed because he was looking for a "library" to do this, talk about overkill.

Rook 2010-10-20 18:53:19

@Rook SO is just amazing. The way we can interact, explore, share and "suck less"... is just close to perfection. Thank you once more!

Frankie 2010-10-20 21:50:40

Answer 6

+1 A:

HTMLPurifier is the undenied best option for cleansing HTML input, and htmlspecialchars should be applied to anything else.

But XSS vulnerabilities should not be cleaned out, because any such submissions are garbage anyway. Rather make your application bail and write a log entry. The best filter set to achieve XSS detection is in the mod_security core rules.

I'm using an inconspicious but quite thorough attribute detection here in new input(), see _xss method.

mario 2010-10-20 02:43:39

ansaurus

tags:

views:

answers:

Is there a definitive anti-XSS library for PHP?

related questions