views:

346

answers:

2

Hi,

What would the best possible way to convert a html page (with css, tables, images etc.) to be converted to word or rtf format. I already know about adding the

content-type = application/word

header and that's not an option because we need the images embedded in the document so that it can be viewed without an active internet connection.

I need either a free (preferably) or commercial .NET library or a command line utility as I need to do this on a hosted ASP.NET application on a shared server :|.

A: 

There are several possibilities for converting HTML to RTF. These links should get you started:

Converting to MS Word .doc is much harder and probably not worthwhile for you. For the reasons this is such a pain, read Joel's interesting article on .doc. If you have to write .doc for some reason, COM interop with MSOffice is probably your best bet.

Colin Pickard
I tried DocFrac and it outputted plain text with some garbage.
Ali Kazmi
+1  A: 

If you are using Word 2003 or 2007 you can convert xhtml documents to Word Xml documents using xslt. If you google for html to docx xsl you will find many examples of the opposite (converting docx to html) so you might one of those examples as a basis for a conversion. The only challenge would be downloading and embedding the images in the document, but that is also possible.

Rune Grimstad