views:

86

answers:

2

I'm looking at using WMD in my project instead of my existing RadEditor. I have been reading a few posts on how to store and retrieve the data, and I want to make sure I have the concept correct before proceeding.

If my research is correct, here is what I should be doing.

  1. I should store the editor data twice (Once as HTML and once as Markdown)
  2. I should run the HTML through a Whitelist before storing it.
  3. I should run the HTML through AntiXSS on the way out (before displaying)
  4. I should use the Markdown data ONLY to repopulate Markdown for editing.

Can anyone confirm or deny if this is correct, and also add any useful input on the subject?

References
Reformat my code: Sanitize Html
StackOverflow: how do you store the markdown using wmd in asp net
StackOverflow: sanitize html before storing in the db or before rendering antixss library
StackOverflow: store html entities in database or convert when retrieved

A: 

So one of the ides behind Markdown is that it will produce "safe" html - there should be no need for separate encoding.

More generally I would recommend storing "raw" data in the database, without transforming it or sanitising it. You should always sanitise or transform as close to the rendering point as possible - it gives greater flexibility (oh, suddenly I need to render as RSS. Or JSON. Damn, I can't because I pre-formatted for HTML) and, should the sanitiser or renderer be updated you see the effects of the update on every piece of data.

I would say store the markdown text in the database, and then convert it when you want it rendered, using the markdown library for this which, in theory, should all safe HTML built from its safe list of tags and attributes.

blowdart
That's a false answer. Markdown allows arbitrary HTML. It's up to the consumer of Markdown to determine what is 'good' HTML and what isn't.
George Stocker
Ah not true. WMD's implementation does, but Markdown itself is supposed to be limited to its own tags, and produce it's own valid XHTML. Of course Gruber's lack of documentation or any spec doesn't help with this.
blowdart
"So one of the ides behind Markdown is that it will produce "safe" html" - does this mean that the WMD engine already strips out unsafe html (like scripts)
rockinthesixstring
@rockinthesixstring No, WMD will not strip out unsafe HTML.
George Stocker
@George, would you mind lending some insight into the original question? I'm really looking for the "best" solution here. The app I'm building is "hopefully" going to have a large amount of loosely moderated content being submitted. I just want to make sure that what's input is clean.
rockinthesixstring
There is another problem with WMD - it's client side. So if you trust it, I can either disable javascript or submit using fiddler. Stackoverflow uses Markdown sharp to parse server side. http://code.google.com/p/markdownsharp/Note that markdown sharp doesn't allow arbitrary HTML, it's "proper" markdown as I insinuated it should be before George assumed what WMD does was "right".
blowdart
Thanks. It looks like MarkdownSharp will do wonders for the sanitizing of the data.
rockinthesixstring
@blowdart I have a bit of experience with markdown, see my answer for my approach.
George Stocker
+2  A: 

I'm implementing Markdown in a Blog engine I'm writing (who doesn't write blog engines?), and I've also implemented Markdown in a number of customized CMSs I've written for clients.

I do it very similarly to how the Stack Overflow team does it:

  1. I use the wmd.js as the client side editor.
  2. I use the MarkdownSharp server side processing.
  3. I use Jeff Atwood's Sanitize HTML to cover processing HTML.

Here are some resources that talk about Markdown:

Bottom line:

  1. I store the post in the form it was submitted in; It's displayed using MarkdownSharp.
  2. I sanitize the HTML using Jeff Atwood's approach (On output, not on input).
  3. I utilize ASP.NET MVC 'best practices' (a highly subjective term) to deal with XSS and XSRF.
George Stocker
Two questions... 1) why sanitize on output and not input? and 2) what are these MVC "best practices'... is there an article to read?
rockinthesixstring
@rock You sanitize on output because you don't want to mangle what the user put in the database. By sanitizing on output you preserve the original text in case you ever decide to change what you're using to sanitize the input. Regards best Practices, see "ASP.NET MVC 1.0" by Rob Conery, Scott Hanselman, Phil Haack, and Scott Guthrie. Also, Phil's blog, http://haacked.com is a great resource for MVC items.
George Stocker
So you're not worried if a user submits stuff like javascript or malicious crap to your database? Also, doesn't sanitizing once on input lower system load over sanitizing on every output?
rockinthesixstring
There are two separate issues here. One: JavaScript only matters once you view it. JavaScript in the database doesn't matter. When someone accesses it via a webpage, then it matters. Two: SQL Injection is handled separately, via Parameterized queries or an ORM like Linq-to-SQL or Entity Framework. There is a load on output, but as you can see with a site like Stack Overflow (that gets more traffic than your site or my site ever will) you can optimize that. Do you even notice the time it takes to load? I don't. How do you think Jeff and Co. do it? They sanitize on view.
George Stocker
@rock also, What do you think `Html.Encode()` does? It doesn't worry about what you put in the database, it only does something when you try to view it. That's far easier to get right without hurting existing input.
George Stocker
Ok, thanks for all of that. Very helpful
rockinthesixstring
One more question @George. If I save as raw data to the database, then I have to disable validateRequests in web.Config. Does this mean that I now have to cleans ALL input fields with AntiXSS?
rockinthesixstring
@rock you add [ValidateInput(false)] to the controller methods you want to allow people to put the custom input to. There are work arounds: http://weblogs.asp.net/rashid/archive/2009/02/14/asp-net-mvc-rc1-validateinput-a-potential-dangerous-request-and-the-pitfall.aspx
George Stocker
great thanks...
rockinthesixstring