views:

460

answers:

2

Is this regular expression enough to catch all cross site scripting attempts when embedding HTML into the DOM. eg: Such as with document.write()

(javascript:|<\s*script.*?\s*>)

It is referenced in this document from modsecurity.com http://www.modsecurity.org/documentation/Ajax%5FFingerprinting%5Fand%5FFiltering%5Fwith%5FModSecurity%5F2.0.pdf

Would it catch all <\s*script.?\s> variants in UTF-8 for instance?

+2  A: 

Unfortunately not. There are actually quite a few ways to sneak past that regex if an attacker is really trying. With modern browsers, that regex should do a pretty good job, but its not exhaustive. For example, something along the lines of this could open javascript without explicitly saying script or javascript

<img src="blah.jpg" alt="" onmousedown="alert('a')" />

Check out here (somewhat outdated but gets the point across) and here for more examples

Mercurybullet
Was going to mention but link above to http://www.owasp.org/index.php/XSS%5F%28Cross%5FSite%5FScripting%29%5FPrevention%5FCheat%5FSheet does a good job of it-- even CSS can be a source of XSS. So many attack vectors.
Ben
Yes, nice link(s).
Crescent Fresh
Thanks for those sources and simple example.
bucabay
+1  A: 

Is this regular expression enough to catch all cross site scripting attempts

Hahahahahahahahahaha.

Sorry. But really... no, that's not even the tip of the iceberg.

Daniel has mentioned one other method of injecting script, but really there are hundreds. It is not at all possible to sanitise HTML using a simple regex. The only approach (and even then it's not trivial) is to properly parse the HTML, throwing out all malformed sequences and element/attribute names except for a few known-safe ones.

Of course this only applies when you are actually deliberately accepting HTML input and you want to limit its potential harm. If the situation is that you're accepting text but forgetting to escape it properly on the way out, you need to fix that HTML-escaping, because no amount of input-sniffing will fix an output-problem.

This is why mod_security is utterly bogus. It is giving you the illusion of improved security by catching a few of the most basic injection techniques, while letting everything else through to a vulnerable application. It won't, in the end, prevent you from being hacked, but the more injection signatures you add, the more it'll deny and mess up legitimate requests. For example it might prevent me from entering this message because it contains the string <script>.

bobince
lol.. maybe you're regex just isn't up to par :D I get your point.
bucabay