views:

435

answers:

4

I have an editor that lets users add HTML that is stored in the database and rendered on a web page. Since this is untrusted input, I plan to use Microsoft.Security.Application.AntiXsSS.GetSafeHtmlFragment to sanitize the HTML.

  • Should I santiize before saving to the database or before rendering the untrusted input into the webpage?
  • Is there an advantage in including the AntiXSS source code in my project instead of just the DLL? (Maybe I can customize the white list?)
  • Which class file should I look in for actual implementation of the GetSafeHtmlFragment
+2  A: 
  • Both
  • Only if you plan on changing it, which I would not do personally
  • The AntiXss class (since it's called as AntiXss.GetSafeHtmlFragment())
David Stratton
There is already HTML code in the database, if I sanitize it before adding it to the database, then I need to sanitize the data that already exists (for consistency). But is there any value in sanitizing before adding it in the database?
To me, the primary value is that it may be some newbie who makes the next web app that reads from your DB. You never know who will take over after you've moved on to greener pastures. Also, it's just good to get in the habit of doing it so it's second nature.
David Stratton
+1 for sanitizing every time, just in case. The only thing to be aware of is a possible performance hit, in which case you can sanitize just on saving to the DB but you then need to be careful that all the info in the DB is really sanitized.
orip
A: 

You can use the in the page directive the parameter ValidateRequest="true". In this way all the Request data is validated and if there's a validation problem you can always catch the error. It also prevents sql injection threads and others not only possible XSS.

With numeric data, you can validate integer overflow or misuse of data types with Int32.TryParse() or any other of the TryParse family (Byte.TryParse Int16.TryParse...)

No need to use any other class or additional sanitizer method.

backslash17
Request validation is on by default but you're putting a bit too much faith in it. You need to treat it as one layer of security which is supplemented by whitelist validation at the point of input (you sort of touched on this re the int parsing) and output encoding. It also won’t do much for you in terms of SQL Injection; the statement “' or 1=1” will go right through. This might help explain it for you a little better: http://www.troyhunt.com/2010/05/owasp-top-10-for-net-developers-part-2.html
Troy Hunt
+3  A: 

I disagree with the selected answer for two reasons

  1. If you stored encoded data, you have to pick an encoder before you store. What happens if you have stored something as HTML but also want to push it out in another format, for example as a JSON response, or as part of an XML document? You now have a an HTML encoded format you must decode, then encode in the correct format.
  2. What if we discover a bug in the encoders and push a new version out? Now, because you're not encoding at the point of output all your old data may contain things that have been incorrectly encoded. You can encode again, but then you hit double encoding issues which can be painful to filter correctly.

Generally you encode at the point of output and treat any data coming from a data store as untrusted by default - after all, what if someone manages to edit your database directly or via SQL injection?

blowdart
I came back and voted you up because I agree with your arguments, and your suggestion matches with the OWASP recommendation. Hopefully, @user102533 will switch and accept yours.
David Stratton
He's not storing encoded data, he's storing sanitized data - big difference (it's still unencoded HTML)
orip
Point 2 still stands then (and yes, I didn't pay enough attention in reading!) What if there's a bug in the sanitizer? Or you want to change the sanitzation rules?
blowdart
A: 

Have a listen to the OWASP podcast 67 with Jeff Williams on XSS. He talks about not sanitising or encoding before storage. The primary reason is that if (when) libraries evolve in response to new vulnerabilities your data is going to be stuck back in the old version. Of course this doesn’t stop you from running any input against a whitelist at the entry point and rejecting anything outside acceptable range.

Troy Hunt