views:

102

answers:

1

Related: http://stackoverflow.com/questions/492515/how-to-store-lightweight-formatting-textile-markdown-in-database

I want to store comment formatting in some markup language in our DB. However, we want to allow multiple formatting languages (markdown, textile, restructuredText). It seems we should store a superset of their features, so that we can convert between them.

  • Will this work?
  • Is there such a superset?
  • Are there libraries to switch between them?
  • Is there a more structured format we should keep comments in, in the DB?

(Python/Google App Engine if it matters)

+5  A: 

Have you considered something simpler: storing the comments in their original form, together with an extra column saying which format it is stored in (markdown, textile, etc...)?

I would think that any superset is either going to result in some loss of information by storing only one of the many possible different ways the syntax can be written in a specific markup or else it will be too complicated as it tries to allow for all the possible encodings of a specific syntax in all the allowable markups.

Mark Byers
I did think about that. The reason I went against it was that I want to be able to switch between them on the fly. But that's a separate problem, and its probably OK for a lossy solution there. So I think you're probably right.
Paul Biggar
@Paul: Mark gave you a good answer, however if you still want to be able to convert between formats, try HTML ex. Markdown->HTML->Textile->HTML->restructuredText
Niteriter
By superset I didn't mean a language which supported all of the syntax. I meant something like HTML (maybe b,i,a,blockquote only). I would convert the reST/markdown/etc into HTML. I'm leaning towards this, as it would mean having some canonical format through which all conversions can go, rather than writing/finding conversion libraries for each pair of formats.
Paul Biggar
I agree that it would be a good idea to go via HTML when converting between any two markups. I don't think this necessarily means that the HTML should be stored in the database. If a user submits a comment and then immediately afterwards tries to edit it, he might be disappointed to see that his carefully formatted markdown has been converted to something less readable by the automatic conversion.
Mark Byers
Hmmm. Store both? I'm not hugely worried about space efficiency. Ease of programming and clean design is more important right now.
Paul Biggar
I'd probably store it both in the original format of the latest edit in the markup used to make the edit, plus the HTML so that it can be displayed quickly. If someone wants to edit it in another markup than the one it was originally written in, I'd probably just convert on the fly since this would be a relatively rare event. On the other hand, you could store it in all markups simultaneously if lots of different users are editing the same text in different markup languages, but the lossiness of the conversions may annoy some people.
Mark Byers