I am looking for an efficient way of reading the raw text from any ms office document (word, excel or powerpoint), then displaying a distinct word list and a count of how many times that word is used. If possible I would like to be able to exclude common words ('and', 'to', 'the', etc).
What is the best way I can achive this in C#?