tags:

views:

213

answers:

1

I am looking for an efficient way of reading the raw text from any ms office document (word, excel or powerpoint), then displaying a distinct word list and a count of how many times that word is used. If possible I would like to be able to exclude common words ('and', 'to', 'the', etc).

What is the best way I can achive this in C#?

A: 

You should look into Lucene.NET - it has the ability to build word indexes from a variety of sources - including, I believe, word documents.

LBushkin