views:

71

answers:

3

Hi

I am trying to figure out all the ways javascript can be written. I am making a white list of acceptable tags however the attributes are getting me.

In my rich html editor I allow stuff like links.

<a href="">Hi </a>

Now I am using html agility pack to get rid of attributes I won't support and html tags for that matter.

However I am still unclear if a person could do something like this

<a href="<script>alert('hi')</script>">Bad </a>

So I am not sure if I have to start looking at the inner text of all attributes that I support and html encode them? Or if what.

I am also not sure how to prevent a html link that goes to some page and launches some javascript on load.

I am not sure if a white list can stop that one.

+4  A: 
<a href="javascript:void(0)" onclick="alert('hi');">Bad</a>

or

<a href="javascript:alert('hi');">Bad</a>
Duniyadnd
Ah! Please people - stop putting "#" as the href for links that execute JavaScript. Use "javascript:void(0)" instead. That way the page doesn't scroll to the top when the link is clicked.
George Edison
@George: Care to explain why?
Christopher Parker
@Chris: I thought I did :)
George Edison
@George - I removed it, though if that's your biggest issue, you can add a "return false" at the end of the onclick attribute.
Duniyadnd
+1  A: 

you can do this:

<a href="javascript:(function(){
    alert('hello');
})()">Hello</a>

if you want to get really crazy

Edit: I like this even better

<a onclick="alert(eval({Crazy:function(){alert('Hello');return 'World';}}).Crazy());">
    Crazy
</a>
Seattle Leonard
you could put whole classes in there
Seattle Leonard
You could even in theory import an external javascript file and access pretty much anything on the page and send it anywhere on the internet.
George Edison
So how do you stop this with a whitelist( I am using html agility pack) without just not allowing links?
chobo2
use regex to parse the href attribute and only allow fully qualified links. Something like ^http(s)?://(www\.)?(\w+\.)?(\w+)\.(com)$I know that's not completely right but I just wrote it on the fly
Seattle Leonard
Ya that's what I actually did I used regex but apparently I still have to worry about UTF-8 Unicode encoding one, hex encoding, jav ascript , jav ascript. So I am not sure how to deal with them.
chobo2
+2  A: 

If you're trying to write an XSS validator for user-entered HTML for production, I highly recommend you use an existing library. Even with the whitelist approach you are taking, there are many, many possible attribute values that can result in XSS. Search for "javascript:" in the XSS Cheat Sheet to see all sorts of places javascript: uris can turn up. Here is an incomplete list:

<IMG SRC="javascript:alert('XSS');">
<INPUT TYPE="IMAGE" SRC="javascript:alert('XSS');">
<BODY BACKGROUND="javascript:alert('XSS')">
<IMG LOWSRC="javascript:alert('XSS')">

There are also ways to inject external script urls, like this:

<XSS STYLE="behavior: url(xss.htc);">

If you're writing this for your own education, then the XSS Cheat Sheet has some really great fodder for unit tests.

Annie
Naw I am not trying to use an XSS validator...I would use the one that comes with asp.net mvc. I am trying to make a whitelist.
chobo2
Oh good :) But I think the cheat sheet still answers your question about all the ways JavaScript can be written.
Annie
Hmm I am not sure how to handle a couple of these cases. UTF-8 Unicode encoding one, hex encoding, jav ascript , jav ascript. Httml agility pack helps me with the <script> ones but I don'tthink these ones.
chobo2
Can the word javascript in these tags have any number of spaces(ie can you have j space a space v space a....
chobo2
No spaces in "javascript" -- but it can be any case, and have spaces (and looks like some other special chars) before and after "javascript".
Annie