tags:

views:

68

answers:

3

Right now I'm generating HTML with a Perlscript, and then manually converting to DOC in OpenOffice. Actually I have to copy, create new "Text document", paste, save, as it treats HTML and DOC as separate file types, but that's quite unessential. That's very inconvenient.

Is there any automated way I can convert HTML to decent DOC, or some other nice format like HTML I can generate textually and convert to DOC in automated way?

(I'm on OSX)

+1  A: 

I can't help you get to .doc, but have you seen the Open XML Format SDK from Microsoft? This will allow you to generate Office 2007 format documents (.docx, .xlsx etc) from .NET code.

Theoretically you may have some luck with this under Mono on OS X, as it doesn't require an installation of Office 2007 (for Windows) to function.

tomfanning
For compatibility I need DOCs, not DOCXs (personally I'd rather just use HTML, but that's the world we live in).Correct me if I'm wrong, but If I understand correctly this SDK doesn't really solve my problem, just replaces easy problem of generating HTML (or XML) with hard problem of doing the same with C# API, and I'm still no closer to DOCs that I've been before. (unless docx is too difficult for normal scripts to generate, and the SDK does something nontrival).
taw
Well if you rigidly need .doc, and there's no way to coerce your client into accepting .docx, then this isn't the solution for you, as I stated.The SDK has a tool that you can feed an existing document into, and it will generate the C# necessary to generate that document _from scratch_. docx generation is certainly easier than .doc, but is indeed still non-trivial.
tomfanning
A: 

Not sure if this is what you want, but you can fairly easily generate WordML documents with code. WordML is the Word 2003 XML file format. It's NOT the same thing at the Office 2007 Open XML formats. WordML is just one file that's not too hard to create if your just doing fairly basic formatting. You could generate it directly rather than creating the HTML first. You can name the files with a .DOC extension and Word 2003 and later will open them just fine. You can resave them as real .DOC file if you want.

Here's the on-line WordML reference. I can send you some sample code if you'd like. http://msdn.microsoft.com/en-us/library/aa212812%28office.11%29.aspx

If you really want to create a general file format that could be converted into other formats, creating XML-FO file might be the way to go. There are a number of products out there that can take XML-FO and transform it into other files, such as Word and PDF.

Tom Winter
A: 

We do use the components of Aspose that are available for .NET and Java. With Java you should be able to use them on OS X, too.

You have to purchase the components (i.e. they are not free), but aside from this, they are really great.

Uwe Keim