views:

2918

answers:

5

I'd like to create a Word document using Python, however, I want to re-use as much of my existing document-creation code as possible. I am currently using an XSLT to generate an HTML file that I programatically convert to a PDF file. However, my client is now requesting that the same document be made available in Word (.doc) format.

So far, I haven't had much luck finding any solutions to this problem. Is anyone aware of an open source library (or *gulp* a proprietary solution) that may help resolve this issue?

NOTE: All possible solutions must run on Linux. I believe this eliminates pywin32.

A: 

Can you write is as the WordML XML files and zip it up into the .docx format? All your client would need is the Word 2007 filter if they aren't on Office 2007 already.

There are many examples out there.

You can also load XML directly into Word, starting with 2003, or so I've been told.

lavinio
Unfortunately, this option is not ideal. From what I can tell, I'd need to convert my data into WordML to maintain the formatting of the document.
Huuuze
+10  A: 

A couple ways you can create Word documents using Python:

EDIT:

Since COM is out of the question, I suggest the following (inspired by @kcrumley's answer):

Using the UNO library to automate Open Office from python, open the HTML file in OOWriter, then save as .doc.

codeape
Wow, you hit 2 of the same 3 ideas I was going to say (COM and RTF). Thanks for saving me the time. :)
kcrumley
+1 for suggesting .RTF instead of .DOC
Hardwareguy
Unfortunately, .doc is required. No RTF.
Huuuze
+2  A: 

1) If you want to just stick another step on the end of your current pipeline, there are several options out there now for converting PDF files to Word files. I haven't tried 123PDFConverter, but the CNET Editors recommend it (same link); it has a free trial; and it supports automation. As with any 3rd-party file converter, your mileage may vary, depending how complicated your PDFs are, and how good the software actually is.

2) Building on codeape's COM automation suggestion, if you COM automate Word, you can open your actual HTML file in Word, and call the "Save As" command, to save it as a DOC file.

kcrumley
+1  A: 

I have had to do something similar with python as well. It is far more manual work than I want, but documents created with pyRTF were causing Word and OpenOffice to crash and I didn't have the motivation to try to figure it out.

I have found it simplest (but not ideal) to create a Word document template with the styles I want. Then my Python creates an HTML file whose <p> styles are labeled after the Word styles. Then I open the HTML file in Word and open the template in Word. I cut and paste all text from the HTML file into the template, and Word re-formats it all according to the styles I had set up previously. That works for the occasional file in my situation. It might not work for your situation. FYI.

+1  A: 

You can use Aspose.Words for .NET. It will allow to create word documents as well as convert them to PDF. Although it is a .NET component, it is possible to invoke it from Python.

romeok
Unfortunately, this will not run on Linux.
Huuuze