Get distinct word list and count from ms office docs using C# | ansaurus

tags:

c#
ms-office

views:

213

answers:

1

Q:

Get distinct word list and count from ms office docs using C#

I am looking for an efficient way of reading the raw text from any ms office document (word, excel or powerpoint), then displaying a distinct word list and a count of how many times that word is used. If possible I would like to be able to exclude common words ('and', 'to', 'the', etc).

What is the best way I can achive this in C#?

A:

You should look into Lucene.NET - it has the ability to build word indexes from a variety of sources - including, I believe, word documents.

LBushkin 2009-07-13 14:31:43

related questions

Displaying Flash content in a C# WinForms application

How to get the value of built, encoded ViewState?

Unhandled Exception Handler in .NET 1.1

How do I connect to a database and loop over a recordset in C#?

How do I most elegantly express left join with aggregate SQL as LINQ query

Get a new object instance from a Type in C#

.NET Testing Framework Advice

Automatically update version number

What is the difference between an int and an Integer in Java/C#?

How to write to Web.Config in Medium Trust ?

WinForms ComboBox data binding gotcha

How do you sort a C# dictionary by value?

Adding Scripting functionality to .NET applications

Floating Point Number parsing: Is there a Catch All algorithm?

How do I print an HTML document from a web service?

Decoding T-SQL CAST in C#/VB.net

Anatomy of a "Memory Leak"

How do I get a distinct, ordered list of names from a DataTable using Linq

Reliable Timer in a Console Application

How do I fill a DataSet or a DataTable from a LINQ query resultset ?

What's the difference between Math.Floor() and Math.Truncate() in .NET?

How do I calculate relative time?

How do I calculate someone's age in C#?

Are there any conversion tools for porting Visual J# code to C#?

When setting a form's opacity should I use a decimal or double?