views:

402

answers:

3

I have an ASP.NET MVC application and I'm using CKEditor for text entry. I have turned off input validation so the HTML created from CKEditor can be passed into the controller action. I am then showing the entered HTML on a web page.

I only have certain buttons on CKEditor enabled, but obviously someone could send whatever text they want down. I want to be able to show the HTML on the page after the user has entered it. How can I validate the input, but still be able to show the few things that are enabled in the editor?

So basically I want to sanitize everything except for a few key things like bold, italics, lists and links. This needs to be done server side.

A: 

How about AntiXSS?

Craig Stuntz
AntiXss.GetSafeHtmlFragment would be a start. That removes any scripting from the HTML. What about the HTML though? Any HTML could be entered. Maybe that really isn't a concern, and only XSS is. I'm not too familiar with the best practices around this. I rarely have a need for users to input something that won't be encoded.
Josh Close
Well, the principal disadvantage of "safe" HTML is that it could be malformed. In which case, something like Tidy might be a solution.
Craig Stuntz
A: 

See my full answer here from similar question:

I have found that replacing the angel brackets with encoded angel brackets solves most problems

rick schott
How is that different than HTML encoding? The problem is, I don't want to convert the angle brackets. Basically, there is a certain set of HTML elements I want to allow to not be encoded. Things like bold, italics, links, and lists.
Josh Close
A: 

You could create a "whitelist" of sorts for the html tags you'd like to allow. You could start by HTML encoding the whole thing. Then, replace a series of "allowed" sequences, such as:

"&lt;strong&gt;" and "&lt;/strong&gt;" back to "<strong>" and "</strong>"
"&lt;em&gt;" and "&lt;/em&gt;" back to "<em>" and "</em>"
"&lt;li&gt;" and "&lt;/li&gt;" back to ... etc. etc.

For things like the A tag, you could resort to a regular expression (since you'd want the href attribute to be allowed too). You would still want to be careful about XSS; someone else already recommended AntiXSS.

Sample Regexp to replace the A tags:

&lt;a href="([^"]+)"&gt;

Then replace as

<a href="$1">

Good luck!

Funka