views:

49

answers:

2

I am looking for a tool or set of tools to convert between file formats D and M where

  • D is a format handled by MSWord, in order of preference, docx, doc, rtf
  • M is a lightweight markup, such as markdown, textile, txt2tags, it can be an esoteric one
  • there is a way to generate html from M
  • conversion is two-way, it's done both from D to M, and from M to D
  • utf-8 encoding is handled properly
  • the content is simple, paragraphs, some simple formatting like bold and italics, maybe lists
  • the tools are platform-independent

What I've found so far

  • TeX, LaTeX -- too heavyweight
  • docx2txt -- too lightweight, it supports no formatting at all
  • html -- MSWord produces bloated html
  • a few one-way conversions, like doc to mediawiki,

UPDATE:

The use case is a document workflow between technical and non-technical people

  • I, the technical guy edit a document in plain text, put it into version control, etc.
  • I send it to my manager or other non-technical people
  • They add comments, make changes to it using their Word, then they send it back to me
  • I want to simply grok their changes, make my changes, put it into version control, without having to use Word
A: 

Adam, I've used docx4j to convert docx to html, edit the html in CKEditor, and then use docx4j to convert the html back to docx. My process made some assumptions about the css (ie it was designed to handle docx4j's clean html, and editing in CKEditor).

You don't say whether there is a way to generate M from HTML?

plutext
A: 

This is probably hard to do two-way, since you will have impedance mismatches between the various formats.

The best world I can think of would be a sort of Wiki / Word hybrid: Maybe you can get Google Wave to do that for you?

Another solution that might work is a CMS like Plone (did they ever add WYSIWIG capability? I stopped caring after version 1). Keep your documents there. Let the system handle changes, annotations etc. You can automate retrieval of the source (should be ReStructuredText) and commit that to your source control if you have to.

Daren Thomas