views:

166

answers:

4

I want to allow users to create tiny templates that I then render in Django with a predefined context. I am assuming the Django rendering is safe (I asked a question about this before), but there is still the risk of cross-site-scripting, and I'd like to prevent this. One of the main requirements of these templates is that the user should have some control over the layout of the page, not just it's semantics. I see a couple of solutions:

  • Allow the user to use HTML, but filter out dangerous tags manually in the final step (things like <script> and <a onclick='..'>. I'm not so enthusiastic about this option, because I'm afraid I might overlook some tags. Even then, the user could still use absolute positioning on <divs> to mess up a thing or two on the rest of the page.
  • Use a markup language that produces safe HTML. From what I can see, in most markup languages, I could strip any html, and then process the result. The problem with this is that most markup languages are not very powerful layout-wise. As far as I could see there is no way to center elements in Markdown, not even in ReST. The pro here is that some markup languages are well-documented, and users might already know how to use them.
  • Come up with some proprietary markup. The cons I see here are pretty much all implied by the word proprietary.

So, to summarize: Is there some safe and easy way to "purify" HTML — preventing xss — or is there a reasonably ubiquitous markup language that gives some control over layout and styling.

Resources:

+1  A: 

There's PHP-Based HTML purifier, I have not used it myself yet but heard very good things about it. They promise a lot:

HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.

Maybe it's worth a try even though it's not Python based. Update: @Matchu has found a Python based alternative that looks good too.

You'll have a lot of very difficult edge cases, though, just think about Flash embeds. Plus, malicious uses of position: absolute are extremely difficult to track down (there's position: relative that could achieve the same effect, but also be a completely legitimate layout tool.) Maybe take a look at what - for example - EBay allow, and don't allow? If anybody has the necessary experience to know what's dangerous and what isn't from millions of examples, they do.

Related resources on EBay:

From what I found, they don't seem to publish their internal HTML blacklists, but output an error message if forbidden code is found. (Probably a wise move on their part, but unfortunate for the purposes of this question.)

Pekka
That would be a solution, however I am looking for a solution that works in python. I found this: http://www.powerhousemuseum.com/dmsblog/index.php/2008/08/21/powerhouse-releases-a-python-html-sanitiser-for-developers-to-use-bsd-license/ though it does not seem like its ready to use yet.
Noio
+1  A: 

Seeing Pekka's answer, I tried to quickly Google an HTML Purifier equivalent in Python. Here's what I came up with: Python HTML Sanitizer. At first glance, it looks pretty good to me.

Matchu
+1  A: 

"Use a markup language that produces safe HTML."

Clearly, the only sensible approach.

"The problem with this is that most markup languages are not very powerful layout-wise."

False.

"no way to center elements in ReST."

False.

Centering is a style -- a CSS feature -- not a markup feature.

  1. The want to center is to assign an CSS Class to a piece of text. The .. class:: directive does this.

  2. You can also define your own interpreted text role, if that's necessary for specifying an inline class on a piece of <span> markup.

S.Lott
Thank you! I had missed the `..class::` directive in the ReST specs. You might agree with me though that markup languages are made for formatting, not for doing layout. Being able to set a CSS-class makes everything possible, ofcourse.
Noio
Markup languages should only provide the semantics and structure. Tags like <i>, <b> and <tt> have no semantic meaning, and do not belong. Presentation (font choice, color, layout, etc.) is a CSS issue.
S.Lott
A: 

You are overlooking server side security issues. You need to be very careful that users can't use the templates import or include mechanism to access files they don't have permission to.

The bigger challenge is to prevent the template system from infinite loops and recursion. This is an obvious threat to system performance, but depending on the implementation and deployment setup, the server may never timeout. With a finite number of python threads at your disposal, repeated calls to a misbehaving template could quickly bring your site down.

mikerobi
I am limiting the template tags to variable includes (`{{var}}`). The context where the template is rendered consists of only strings. I think that makes it more or less safe. I asked about this before (see question). Do you see an exception?
Noio