views:

545

answers:

3

I want to use Markdown for my website's commenting system but I have stumbled upon the following problem: What should I store in the database - the original comment in Markdown, the parsed comment in HTML, or both?

I need the HTML version for viewing and the Markdown version if the user needs to edit his comment. If I store the Markdown version, I have to parse the comments at runtime. If I store the HTML version, I need to convert the comment back to Markdown when the user needs to edit it (I found Markdownify for this but it isn't flawless). If I store both versions, I'm doubling the used space.

What would be the best option? Also, how does Stack Overflow handle this?

+5  A: 

Store both. It goes against the rules for database normalization, but I think it's worth it for the speed optimisation in this case - parsing large amounts of text is a very slow operation.

You only need to store it twice, but you might need to serve it thousands of times, so it's a space-time trade-off.

Mark Byers
Why store the parsed version? Caching on the presentation side should negate any performance implications of having to reparse it out of the database.
Chris
I agree with @Mark and @Chris. On the one hand, storage is cheap. But, if you've got such a volume of data that it's nontrivial to store both, then you've got enough data that you should likely be caching in your presentation layer anyway, in which case @Chris is right.
Chris
For a site like SO that gets hits by thousands of users, including bots that scan almost the whole site every day, you'd need a huge amounts of cache. Of course, it might be possible with just a memory cache and no db cache... I don't know how much space you'd need for almost 500,000 questions.... I'd imagine quite a lot. But I understand your point. Of course you should cache pages and parts of pages in memory. But does a memory cache make a DB cache completely redudant or does it make sense to have both?
Mark Byers
Store the Markdown version in the db and parse it client side using JavaScript so you don't use server resources.
Axsuul
I will accept this solution because it has greater accessibility and I currently only have 5MB of comments and doubling that size isn't much of a problem. When the size gets too big, I will probably switch to client-side parsing.
bilygates
+3  A: 

Just render the Markdown to HTML at runtime.

If your site runs into performance issues, the Markdown will be one of the last things you'll look into tweaking. And even then, I doubt it'll make sense.

Just take a look at the realtime JavaScript renderer that SO uses. It's fast.

Edit: Sorry, I should've been more clear. I meant just render in PHP. You'll save yourself a lot of headache -- and you probably have more important things to worry about.

sumeet
On more complex pages, I've often had to wait a second or two for the rendering to finish rendering so that I can type. And why do you think they make the client do it instead of doing it on the server? Probably because they don't want to waste server resources parsing text.
Mark Byers
I was going to suggest doing it all in JavaScript instead. It makes perfect sense to do anything you can on the client. Clients are getting faster, JavaScript is getting faster. It's like free distributed computing. Why wouldn't you?
Daniel Earwicker
@Earwicker: Search engines don't interpret Javascript, and many ordinary browsers have Javascript disabled by default. I think it's OK to partially disable advanced functionality if Javascript is disabled, but I think that showing people markdown instead of HTML if they have Javascript turned off wouldn't be a good idea.
Mark Byers
@Mark Beyers good point.
Axsuul
@Mark Byers - this may be of interest: http://stackoverflow.com/questions/121108/how-many-people-disable-javascript - of course you may need to support those < 3%, but you could detect and reroute them to a page that does it on the server, the slow way. With so few users taking that route, it wouldn't be such a big load.
Daniel Earwicker
+6  A: 

Store the original markdown and parse at runtime. There are a few problems with storing the converted version in the database.

  1. If user wants to edit their comment, you have to backwards convert parsed into original markdown
  2. Space in database (always go by the rule that if you don't need to store it, don't)
  3. Changes made to markdown parser would have to be run on every comment in the database, instead of just showing up at runtime.
Corey Hart
3 is a very good point so +1 for that. But if a user has made a comment using an old version of markdown and you make a change to the parser, you might not want to have the change automatically propogate to all comments, as it could break the formatting in some of the older comments.
Mark Byers