ansaurus

Question

Whitelisting, Preventing XSS with WMD Control in C# .NET

Answer 1

+1 A:

If your requirements really are that basic that you can do such simple string replacements then yes, this is ‘safe’ against XSS. (However, it's still possible to submit non-well-formed content where <i> and <b> are mis-nested or unclosed, which could potentially mess up the page the content ends up inserted into.)

But this is rarely enough. For example currently <a href="..."> or <img src="..." /> are not allowed. If you wanted to allow these or other markup with attribute values in, you'd have a whole lot more work to do. You might then approach it with regex, but that gives you endless problems with accidental nesting and replacement of already-replaced content, seeing as how regex can't parse HTML, and that.

To solve both problems, the usual approach is to use an [X][HT]ML parser on the input, then walk the DOM removing all but known-good elements and attributes, then finally re-serialise to [X]HTML. The result is then guaranteed well-formed and contains only safe content.

bobince 2010-01-20 20:28:13

So, assuming I wanted something more robust, what would you suggest for the parsers you mentioned? Could HTML Agility Pack handle it?Is there not something that does all this already?

Blankasaurus 2010-01-20 20:36:27

Yes, HTML Agility Pack is a good choice. Once you've got the DOM parsed it's a relatively trivial exercise to write a recursive function that removes all but known-good elements/attributes from the DOM tree. Also if you allow `href`/`src`/etc., remember to check the URLs for known-good schemes like `http`/`https`, to avoid injection through `javascript:` URLs and the like.

bobince 2010-01-20 20:58:14

ansaurus

tags:

views:

answers:

Whitelisting, Preventing XSS with WMD Control in C# .NET

related questions