views:

223

answers:

5

I want to save the Word document in HTML using Word Viewer without having Word installed in my machine. Is there any way to accomplish this in C#?

A: 

You will need to have MS Word installed to do this, I believe.

Check out this article for details on the implementation.

Tim S. Van Haren
Thanks for the reply. But i don;t have MS word installed on te machine. so i have to do this using the Word viewer only
abcurl
A: 

According to this Stack Overflow question, it isn't possible with word viewer. You will need Word to use COM Interop to interact with Word.

Bryan
Thanks for the reply. But i don;t have MS word installed on te machine. so i have to do this using the Word viewer only
abcurl
That's what I'm saying - I don't believe it is possible without the full version of word. You could have a go using ZombieSheep's answer, but I doubt you will get very far TBH. It would make more sense to buy a copy of Word and use COM interop.
Bryan
Word not installed = SOL?
Jason Down
Yes, At the client end MS-Word is not installed , so i have to complete the job using the word viewer component only
abcurl
You *can't* do it with word viewer. Period.
Bryan
+1  A: 

I think this will depend on the version of the Word document. If you have them in docx format, I believe they are stored within the file as XML data (but it is so long since I looked at the specification I am perfectly happy to be corrected on that).

ZombieSheep
Correct, docx files are XML. The format differs from Word 2003 to 2007 and is a complete pain to work with!
roryf
Yes, rename the .docx extension to .zip and you can access all the files that make up the document. But without the full version of word and COM interop, he's going to have a hard time trying to acheive his goal from the XML. +1 btw, as it's the only way he's going to do it without Word.
Bryan
Yes, At the client end MS-Word is not installed , so i have to complete the job using the word viewer component only
abcurl
If it's stored in docx format you can open and manipulate the XML without using Word Viewer or COM interop, otherwise this cannot be done without Word. @Bryan FYI, docx 2003 isn't a zip archive it's just an XML file with base64 encoded images.
roryf
@Rory Fitzpatrick, Try renaming a .docx to .zip and take a look for yourself. http://www.google.co.uk/search?q=.docx+rename+to+.zip
Bryan
@Bryan you're right, I was confusing .docx with Word 2003 .xml format
roryf
@Rory Fitzpatrick: Ah okay, fair enough. Your spot on about the XML though, as it's definitely in the .zip file.
Bryan
A: 

If you're open to not using C#, you could do something like print to file using PrimoPDF (which would change the .doc into a .pdf) and then use a PDF to HTML converter to go the rest of the way. After that you can edit your html however you like.

dnagirl
A: 

Using the document conversion tools available in OpenOffice.org is probably the only possible option - the .doc format is only designed to be opened via Microsoft products so any libraries dealing with it will need to have reverse engineered the entire format.

ternaryOperator