Hi, i need to extract text from an old MS word .doc file in C#. What is the easiest (or else the best) way to get that job done?
+1
A:
First, you need to add in the MS Word object library. Go to Project => Add Reference, select the COM tab, then find and select "Microsoft Word 10.0 Object Library". The version number might be different on your computer. Click OK.
After you have done that, you can use the following code. It will open up an MS Word doc, and display each paragraph in a message box -
// Read an MS Word Doc
private void ReadWordDoc()
{
try
{
Word.ApplicationClass wordApp = new Word.ApplicationClass();
// Define file path
string fn = @"c:\test.doc";
// Create objects for passing
object oFile = fn;
object oNull = System.Reflection.Missing.Value;
object oReadOnly = true;
// Open Document
Word.Document Doc = wordApp.Documents.Open(ref oFile, ref oNull,
ref oReadOnly, ref oNull, ref oNull, ref oNull, ref oNull,
ref oNull, ref oNull, ref oNull, ref oNull, ref oNull,
ref oNull, ref oNull, ref oNull);
// Read each paragraph and show
foreach (Word.Paragraph oPara in Doc.Paragraphs)
MessageBox.Show(oPara.Range.Text);
// Quit Word
wordApp.Quit(ref oNull, ref oNull, ref oNull);
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
CraigS
2009-12-11 11:08:01
A:
Depending on your needs and budget you might want to look at the Aspose.Words library. It's not cheap, but might cut down on the effort needed to extract that text. The bonus is that you don't need to have MSOffice installed on your deployment computer (which is mandatory IMHO if you are running this on a server).
Jeremy Wiebe
2009-12-11 16:43:38