tags:

views:

5367

answers:

7

I'm looking for a .NET library that will allow creation of a Word document. I need to export HTML based content to a Word doc (97-2003 format, not docx).

I know that there are the Microsoft Office Automation libraries and Office interop, but as far as I can tell, they require that you have office actually installed and they do the conversion by opening word itself. But I don't want to have the requirement of having office installed for the conversion to work.

Edit: Converting to RTF may even work, if possible.

A: 

Since the doc format specification is not open, and the interop assemblies are the Microsoft solution, I fear that they are your primary (or even only) option.

They do indeed require office to be installed, and they open Word (although showing a window is optional).

I think Word can open HTML documents; is that an option for you?

Erik Hesselink
Bzzt! the doc specs for Word-ML format are freely available. In fact, in my scenario, I produce a single XML file from MS-Word, and then just did a text-replace on fields in that XML file, to "dynamically generate" a new doc, in a mail-merge sort of way. Simple, easy.
Cheeso
That's the XML format, right? The question was about the binary Word format...
Erik Hesselink
+2  A: 

I have found that a document output to HTML but called .doc will open properly formated in Word. I tested with Word 2000 and a file with an internal style sheet.

Remou
+1  A: 

Using Word Automation from ASP.NET is not a good idea (see the MSKB - http://support.microsoft.com/default.aspx?scid=kb;EN-US;q257757#kb2)

If you are not using WinForms your best option IMHO is to generate RTF, which ms word will happily open. (see the link in the already referenced article).

Good Luck!

Christopher Edwards
A: 

I tried just opening the html directly in word, which technically works except for one thing... My html doc also contains CSS, and when opening in Word, it completely ignores the CSS so I no longer have any of the formatting. I realize that I wouldn't get everything out of the CSS but I would at least like to still have the specified fonts, font sizes, etc... Any way to get it to read the CSS? Would it work if I somehow converted the CSS to be embedded in the HTML??

Adam Haile
+3  A: 

Would it work if I somehow converted the CSS to be embedded in the HTML??

Yes. I use an internal style sheet, as I mentioned.

Document Example:

<html>
<head>
<STYLE type="text/css">
    h1 {text-align:center; font-size:12.0pt; font-family:Arial; font-weight:bold;}

    p {margin:0in; margin-bottom:0pt; font-size: 10.0pt;font-family: Arial;}
    p.Address {text-align:center;font-family:Times; margin-bottom: 10px;}
</style></head>
<body>
<p class="Address">The Street</p>
<h1>Head</h1>
Remou
We do this too, to allow our dynamic pages to be 'exported' to Word. The page content HTML is extracted and then inserted into the middle of a Word HTML doc template that already contains all the styles that the html needs.
Si Keep
+3  A: 

I use Aspose for working with Word, makes everything a breeze: http://www.aspose.com/

Chris Canal
It seems very expensive (>$800) when all that is required is output, yesno?
Remou
A: 

There's a tool called JODConverter which hooks into open office to expose it's file format converters, there's versions available as a webapp (sits in tomcat) which you post to and a command line tool. I've been firing html at it and converting to .doc and pdf succesfully it's in a fairly big project, haven't gone live yet but I think I'm going to be using it. http://sourceforge.net/projects/jodconverter/

Andrew Hancox