views:

303

answers:

4

Having a set of about 400 Documents in word which are part of a Quality Management System Word is causing me a lot of grieve because a) it handles images in large doc poorly b) the layout gets sometimes busted c) it is cumbersome to configure the documentation for different clients.

I can convert single documents by saving them as xml/html or text and convert them manually into latex but that is not possible for 400 documents. I know that i can print word documents directly to pdf with tools like PrimoPDF but that is not flexible enough because i need to modify the content.

Is there a way to keep the structure of the document like plain text, headings, tables, images and transform it into XML? Afterwards i would like to transform the XML into html, latex and pdf according the choices of our clients and also modify the content? Is xslt a way to go for transforming the xml to the other formats?

Thanks for any advice.

+1  A: 

For batch converting MS Word to something else you might have a look at OpenOffice.org. OpenOffice has a (command line) batch mode for mass conversions. You can also have a look at JodConverter which converts documents using just that mechanism.

That way you could mass convert Micrososoft Word to some other format OpenOffice.org supports. Perhaps text, perhaps RTF, perhaps OpenOffice XML.

You then have a hopefully easier format to convert to Latex.

Have a search for Word and OpenOffice right here at Stack Overflow, you'll find results like this one about Word to Html conversion.

extraneon
Thanks. JodConverter sounds good. I will give it a try.
da8
+1  A: 

You could convert your documents to Word 2007. Office 2007 documents are XML documents: just change the file extension to .zip and upzip. Also, Microsoft publishes an API for working with Office 2007 documents that is higher-level than working with the XML tags.

John D. Cook
Thanks. We are using an older version so far but i have asked for Office 2007 in order to elaborate more on the path MS Office 2007 -> XML -> XML, XHTML, LaTex and PDF.
da8
A: 

There is advice on Word <--> LaTeX conversions at TUG (TeX User Group):

http://www.tug.org/utilities/texconv/pctotex.html

that may be worth having a look at to see if any of the suggestions and methods meet your requirements.

mas
A: 

Not sure how well it works, but there is Word2tex.

Mica