views:

801

answers:

4

I am using TinyMCE editor for textarea fileds in Django forms.

Now, in order to display the rich text back to the user, I am forced to use the "safe" filter in Django templates so that HTML rich text can be displayed on the browser.

Suppose JavaScript is disabled on the user's browser, TinyMCE won't load and the user could pass <script> or other XSS tags from such a textarea field. Such HTML won't be safe to display back to the User.

How do I take care of such unsafe HTML Text that doesn't come from TinyMCE?

+3  A: 

You are right to be concerned about raw HTML, but not just for Javascript-disabled browsers. When considering the security of your server, you have to ignore any work done in the browser, and look solely at what the server accepts and what happens to it. Your server accepts HTML and displays it on the page. This is unsafe.

The fact that TinyMce quotes HTML is a false security: the server trusts what it accepts, which it should not.

The solution to this is to process the HTML when it arrives, to remove dangerous constructs. This is a complicated problem to solve. Take a look at the XSS Cheat Sheet to see the wide variety of inputs that could cause a problem.

lxml has a function to clean HTML: http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html, but I've never used it, so I can't vouch for its quality.

Ned Batchelder
The XSS cheat sheet is a good example of why implementing HTML cleanup routines is a fairly futile process. Whitelisting html tags is really the only way to completely avoid that.
Paul McMillan
A: 

There isn't a good answer to this one. TinyMCE generates HTML, and django's auto-escape specifically removes HTML.

The traditional solution to this problem has been to either use some non-html markup language in the user input side (bbcode, markdown, etc.) or to whitelist a limited number of HTML tags. TinyMCE/HTML are generally only appropriate input solutions for more or less trusted users.

The whitelist approach is tricky to implement without any security holes. The one thing you don't want to do is try to just detect "bad" tags - you WILL miss edge cases.

Paul McMillan
A: 

Is anybody aware of the following snippet:

BeautifulSoup snippet

I was planning to use this snippet as a solution to my problem. It says that it sanitises HTML and allows whitelisted HTML tags and also protects against XSS attacks?

I haven't used this code, but it looks like a reasonable approach. Beware: BeautifulSoup's weakness is that it is slow. BTW: if our answers have helped you, it's good form to vote them up and perhaps even accept one.
Ned Batchelder
A: 

You can use the template filter "removetags" and just remove 'script'.

Abe