views:

437

answers:

2

I'm looking for a simple HTML santizer written in JavaScript. It doesn't need to be 100% XSS secure.

I'm implementing Markdown and the WMD Markdown editor (The SO master branch from github) on my website. The problem is that the HTML that is shown in the live preview isn't filtered, like it here on SO. I am looking for a simple/quick HTML santitizer written in JavaScript so that I can filter the contents of the preview window.

No need for a full parser with complete XSS protection. I'm not sending the output back to the server. I'm sending the Markdown to the server where I use a proper, full HTML santitizer before I store the result in the database.

Google is being absolutely useless to me. I just get hunderds of (often incorrect) articles on how to filter out javascript from user generated HTML in all kinds of server-side languages.

UPDATE

I'll explain a bit better why I need this. My website has an editor very similar to the one here on StackOverflow. There's a text area to enter MarkDown syntax and a preview window below it that shows you how it will look like after you submitted it.

When the user submits something, it is sent to the server in MarkDown format. The server converts it to HTML and then runs a HTML sanitizer on it to clean up the HTML. MarkDown allows arbitrary HTML so I need to clean it up. For example, the user types something like this:

<script>alert('Boo!');</script>

The MarkDown converter does not touch it since it's HTML. The HTML sanitizer will strip it so the script element is gone.

But this is not what happens in the preview window. The preview window only converts MarkDown to HTML but does not sanitize it. So, the preview window will have a script element.This means the preview window is different from the actual rendering on the server.

I want to fix this, so I need a quick-and-dirty JavaScript HTML santitizer. Something simple with basic element/attribute blacklisting and whitelisting will do. It does not need to be XSS safe because XSS protection is done by the server-side HTML sanitiser.

This is just to make sure the preview window will match the actual rendering 99.99% of the time, which is good enough for me.

Can you help? Thanks in advance!

A: 

something like this?

EDIT: The second result was better.

it was the first result on my search for "use javascript to strip html from text"

looks like it requires prototype though.

Brandon H
I don't want to strip HTML. I want to sanitize it :-)
Sander Marechal
my mistake. so you are taking the raw user entered html and turning it back into markdown?what exactly do you want the user to see in the preview? what goes to the server?
Brandon H
I am only sending the markdown to the server. The server converts it to HTML and then runs a HTML sanitiser on it. The problem is that the preview window needs to do the same. Currently it only turns the markdown into HTML but it doesn't sanitise the HTML. Therefor, the preview does not match the actual result.
Sander Marechal
+1  A: 

You should have a look at the one recommended in this question http://stackoverflow.com/questions/295566/sanitize-rewrite-html-on-the-client-side

And just to be sure that you don't need to do more about XSS, please review the answers to this one http://stackoverflow.com/questions/942011/how-to-prevent-javascript-injection-attacks-within-user-generated-html

Michael Dillon
Caja looks useful, but heavy. I'll have to test if it's fast enough. I doubt it though. I'm sure I am safe from XSS because the HTML I'm parsing is never sent to the server. I'm sending the original Markdown. The HTML I need to sanitize is just the preview and nobody except the user typing it will ever see it.
Sander Marechal