views:

652

answers:

3

Users can edit "articles" in my application. Each article is mastered in the DB and sent to the client as Markdown -- I convert it to HTML client side with Javascript.

I'm doing this so that when the user wants to edit the article he can edit and POST the Markdown right back to the server (since it's already on the page).

My question is how to sanitize the Markdown I send to the client -- can I just use Rails' sanitize helper?

Also, any thoughts on this approach in general? Another strategy I thought of was rendering and sanitizing the HTML on the server, and pulling the Markdown to the client only if the user wants to edit the article.

A: 

I haven't used Markdown in Rails, but my approach would be to take the submitted Markdown and store it, as well as an HTML rendered and sanitized copy of it, in the database. That way you're not throwing any information away in your sanitization, and you're not having to re-render the Markdown every time you want to display an article.

Rails' sanitize helper should do the job. There are also a number of plugins (such as xss_shield and xss_terminate) which can be used to whitelist your output, just to make sure you don't forget to sanitize!

nfm
And when the user clicks "edit" on the article, download the Markdown source from the server via AJAX?
Horace Loeb
That's right. Otherwise in your Markdown -> sanitized HTML -> Markdown conversion you may be losing data that you shouldn't be.
nfm
Why store the sanitized HTML? you could just as easily use fragment caching and then expire the cache on change of markdown.
Omar Qureshi
+1  A: 

I follow a couple principals:

  • store what the user types
  • sanitize on display
  • only send data that is necessary

That leads me to the alternative architecture you suggest:

  • store markdown in the database
  • on render, markdown/sanitize, and send HTML to browser
  • when (and if) the user chooses "Edit", request the raw markdown from the server via AJAX
  • if I have a "preview" view during edit, I try to use the server to render this as well (although you may need to remove this step if it's too slow). During preview, though, sanitizing may not be that critical.

This has been my approach and it works out pretty cleanly.

ndp
+2  A: 

The other answers here are good, but let me make a few suggestions on sanitization. Rails built-in sanitizer is decent, but it doesn't guarantee well-formedness which tends to be half the problem. It's also fairly likely to be exploited since it's not best-of-breed and it has a large large install footprint for hackers to attack.

I believe the best and most forward-looking sanitization around today is html5lib because it's written to parse as a browser does, and it's a collaboration by a lot of leaders in the field. However it's a bit on the slow side and not very Ruby like.

In Ruby I recommend either Loofah which lifts some of the html5 sanitization stuff verbatim, but uses Nokogiri and runs much much faster or Sanitize which has a solid test suite and very good configurability (don't shoot yourself in the foot though).

I just released a plugin called ActsAsSanitiled which is a rewrite of ActsAsTextiled to automagically sanitize the textiled output as well using the Sanitize gem. It's designed to give you the best of both worlds: input is untouched in the DB, yet the field always outputs safe HTML without needing to remember anything in the template. I don't use Markdown myself, but I would consider adding BlueCloth support.

dasil003
Thanks. If you do end up supporting Markdown, RDiscount is much better than BlueCloth (faster, has "Smarty" support)
Horace Loeb