views:

999

answers:

4

Now i ran into some stupid situation. I want the users to be able to use textile, but they shouldn't mess around with my valid HTML around their entry. So I have to escape the HTML somehow.

  • html_escape(textilize("</body>Foo")) would break textile while

  • textilize(html_escape("</body>Foo")) would work, but breaks various Textile features like links (written like "Linkname":http://www.wheretogo.com/), since the quotes would be transformed into &quot; and thus not detected by textile anymore.

  • sanitize doesn't do a better job.

Any suggestions on that one? I would prefer not to use Tidy for this problem. Thanks in advance.

A: 

Looks like textile simply doesn't support what you want.

You really want to only allow a carefully controlled subset of HTML, but textile is designed to allow arbitrary HTML. I don't think you can use textile at all in this situation (unless it supports that kind of restriction).

What you need is probably a special "restricted" version of textile, that only allows "safe" markup (defining that however might already be tricky). I do not know if that exists, however.

You might have a look at BBCode, that allows to restrict the possible markup.

sleske
There's also Markdown (which Stack Overflow uses), http://daringfireball.net/projects/markdown/
David Zaslavsky
Yea, thought about markdown, too. But AFAIK stackoverflow does additional escaping (some blogpost of Jeff pointed that out). Markdown also allows arbitrary HTML.
Marcel J.
@David Zaslavsky: Read from the official website (http://daringfireball.net/projects/markdown/syntax#overview): "For any markup that is not covered by Markdown’s syntax, you simply use HTML itself."
Vanuan
@Vanuan: True, but I don't need to be instructed to read the official website. Besides, some Markdown implementations offer a "safe mode" which will sanitize HTML in the input.
David Zaslavsky
A: 

What about using the whitelist plugin to remove invalid html and limit the tags that can be used?

Toby Hede
From that what I read in the readme, it does exactly what `sanitze` does - seems like the functionality of that plugin was merged into Rails Edge.
Marcel J.
+4  A: 
Marcel J.
Thanks a lot!!!
Vanuan
But beware of pre and code tag exploits. eg. (dare I risk this?) <pre onmouseover="alert('Gotcha!')">!!!!mouse trap!!!!</pre>
Noel Walters
A: 

This works for me and guards against every XSS attack I've tried including onmouse... handlers in pre and code blocks:

<%= RedCloth.new( sanitize( @comment.body ), [:filter_html, :filter_styles, :filter_classes, :filter_ids] ).to_html -%>

The initial sanitize removes a lot of potential XSS exploits including mouseovers.

As far as I can tell :filter_html escapes most html tags apart from code and pre. The other filters are there because I don't want users applying any classes, ids and styles.

I just tested my comments page with your example

"</body>Foo" 

and it completely removed the rogue body tag

I am using Redcloth version 4.2.3 and Rails version 2.3.5

Noel Walters