views:

164

answers:

6

I need to check if user submitted HTML contains any javascript. Im using PHP for validation.

Thanks for any help!

A: 

Scan for script tags, events (as Tom Haigh commented) and href="javascript:...".

Pawka
Down vote for what? What else you can suggest? It doesn't matter how you scan for there tags: regexp, strstr or in other way, but you still need to scan.
Pawka
what about <span onmouseover="doEvil();">hello</span>
Tom Haigh
Ok, then you need scan for script tags and events.
Pawka
...and href="javascript:..."
Pawka
Don't see a reason for downvotes too...
KB22
one reason for their downvote might be that it endorses the "enumerating badness" antipattern http://www.ranum.com/security/computer_security/editorials/dumb/
OJW
'enumerating badness' could be called 'white listing' in other words. But here I can see only white listing way to solve the problem.
Pawka
enumerating badness is black listing
Tom Haigh
Yes, I mean black listing... My mistake.
Pawka
A: 

You could remove the script tags as Pawka states using regular expressions. I found a thread on this here.

Basically it's:

$list=preg_replace('#<script[^>]*>.*?</script>#is','',$list);

Code is from that page, not written by me.

KB22
What about javascript that doesn't need to be in script tags? such as the stuff in 'onclick'?
Evernoob
In deed, the regex needs to be extended to get all event handlers too. But maybe several expressions would be easier to read... Anyways, this is a good point which forgot to take into account.
KB22
As mentioned above, I think that filter gets owned by putting the end tag in an HTML comment (and I'm sure the real troublemakers have hundreds of other tricks). At least remove the ? from after the central .*
OJW
A: 

You'll need to scan for <script> tags but you'll also need to scan for attributes like onclick="" or onmouseover="" etc... that can have javascript without the need for the script tags.

Evernoob
`<img src="javascript:whatever">`
OJW
+4  A: 

It might be better to take a different approach and use something like HTML Purifier to filter out anything that you don't want. I think it would be very difficult to safely remove any possibility of javascript without actually parsing the HTML properly.

Tom Haigh
+1 use an existing popular library. HTML parsing is *hard*, using your own hacked-up regex is a recipe for failure. I can't guarantee HTML Purifier is secure (there have been holes in it in the past), but it's going to be way ahead of your first attempt.
bobince
+6  A: 

If you want to protect yourself against Cross-Site Scripting (XSS), then you should better use a whitelist than a blacklist. Because there are too many aspects you need to consider when looking for XSS attacks.

Just make a list of all HTML tags and attributes you want to allow and remove/escape all other tags/attributes. And for those attributes that can be used for XSS attacks, validate the values to only allow harmless values.

Gumbo
+2  A: 

OK, let's not all be naive here:

<script> "<!-- </script> -->"; document.write("hello world"); </script> (should pass the filters suggested by regexadvice)

filtering-out javascript is a security-critical thing, which means you need to do it thoroughly and properly, not some quick hack.

OJW