tags:

views:

117

answers:

3

Hi, i need to extract text from an old MS word .doc file in C#. What is the easiest (or else the best) way to get that job done?

+1  A: 

This article might help:

Reading a Word document using C#

cxfx
+1  A: 

First, you need to add in the MS Word object library. Go to Project => Add Reference, select the COM tab, then find and select "Microsoft Word 10.0 Object Library". The version number might be different on your computer. Click OK.

After you have done that, you can use the following code. It will open up an MS Word doc, and display each paragraph in a message box -

// Read an MS Word Doc
private void ReadWordDoc()
{
    try
    {
        Word.ApplicationClass wordApp = new Word.ApplicationClass();

        // Define file path
        string fn = @"c:\test.doc";

        // Create objects for passing
        object oFile = fn;
        object oNull = System.Reflection.Missing.Value;
        object oReadOnly = true;

        // Open Document
        Word.Document Doc = wordApp.Documents.Open(ref oFile, ref oNull, 
                ref oReadOnly, ref oNull, ref oNull, ref oNull, ref oNull, 
                ref oNull, ref oNull, ref oNull, ref oNull, ref oNull, 
                ref oNull, ref oNull, ref oNull);

        // Read each paragraph and show         
        foreach (Word.Paragraph oPara in Doc.Paragraphs)                
            MessageBox.Show(oPara.Range.Text);

        // Quit Word
        wordApp.Quit(ref oNull, ref oNull, ref oNull);

    }
    catch (Exception ex)
    {
        MessageBox.Show(ex.Message);
    }

}
CraigS
A: 

Depending on your needs and budget you might want to look at the Aspose.Words library. It's not cheap, but might cut down on the effort needed to extract that text. The bonus is that you don't need to have MSOffice installed on your deployment computer (which is mandatory IMHO if you are running this on a server).

Jeremy Wiebe