views:

9385

answers:

10

Hi,

How do I load MS Word document (.doc and .docx) to memory (variable) without doing this?:

wordApp.Documents.Open

I don't want to open MS Word, I just want that text inside.

You gave me answer for DOCX, but what about DOC? I want free and high performance solution - not to open 12.000 instances of Word to process all of them. :( Aspose is commercial product, and 900$ is a way too much for what I do.

+2  A: 

If you are dealing with docx you can do this with out doing any interop with Word .docx file actually a ZIP contains an XML file , you can read the XML Please refer the below links

http://conceptdev.blogspot.com/2007/03/open-docx-using-c-to-extract-text-for.html

Office (2007) Open XML File Formats

Jobi Joy
+2  A: 

For docx formatted Word Documents I found this interesting article on The CodeProject

Using DocxToText to Extract Text from DOCX Files

In the article the author discusses stripping out just the words themselves.

For your doc (non-docx) Word Documents other than using the Office APIs and (in the background) spawning an instance of Word you could try shelling out to one of the many different Doc2Docx converters on the market and then applying the above process for both.

Jason Whitehorn
Is there any free doc to docx solution?
Skuta
A: 

Docx was answered, but what about DOC? Is there any free library to turn doc2docx?

Skuta
A: 

I don't mean to be an antagonist, but why?

I've extracted data from Word Documents on Linux servers using Word2X or AbiWord and depending on the number and the variety of docments there will always be errors with the extraction. It's worse the more bullets, page breaks, document sections and other "special" features there are.

I understand there are options now to automate OpenOffice to process documents, but my advice is, if you can, just use Word to process Word documents.

bill_the_loser
If I want to process 12.000 word documents every day.. Guess why I don't want to open 12000 instances of Word ..
Skuta
+1  A: 

I recently did some research on this topic. It turns out that to be able to manipulate word files programatically without opening word itself you need some very expensive tools.

There's an article over at code project on manipulating Word, you might find it useful. The author build a C# COM wrapper for dealing with calls to Word. It looks like it actually pops open the word application though.

This post over at the neowin forums looks promising too. It includes quite a few PInvoked calls for the purpose of text extraction.

Maybe if you could find a way to keep the window hidden it would be acceptable.

Rick Minerich
A: 

Aspose has a component to read, modify and write Word documents. Here is the product link : Aspose.Words for .NET and Java

Aspose.Words enables .NET and Java applications to read, modify and write Word® documents without utilizing Microsoft Word®. Aspose.Words supports a wide array of features including document creation, content and formatting manipulation, powerful mail merge abilities, comprehensive support of DOC, OOXML, RTF, WordprocessingML, HTML, OpenDocument and PDF formats. Aspose.Words is truly the most affordable, fastest and feature rich Word component on the market.

Cihan Ucar
free library, -> Aspose: US$899
Skuta
A: 

Aspose is commercial product, nothing free out there?

Skuta
+3  A: 

Skuta,

You can use wordconv.exe which is part of the Office Compatibility Pack to convert from doc to docx.

http://www.microsoft.com/downloads/details.aspx?familyid=941b3470-3ae9-4aee-8f43-c6bb74cd1466&displaylang=en

Just call the command like so: "C:\Program Files\Microsoft Office\Office12\wordconv.exe" -oice -nme InputFile OutputFile

I'm not sure if you need word installed for it to run but it does work. I use it locally as a windows shell command to convert old office files to 2007 format whenever I want.

Cheers,

Kyle

A: 

invoke.co.nz there's free docx lib there (registration needed)

doesn't look free to me
joegtp
A: 

i have uploaded word file , now i want to show the contents of file as is it is. how to do it????

shweta