views:

153

answers:

2

Does anybody know of an API/SDK or IFilter in .NET that can read the subject ('title' metadata) and text from the following files:

.PDF .DOC .XLS .PPT .CSV .TXT .DOCX .XLS .PPTX + the OpenOffice and Open Document standards.

Open source would be awesome... but commercial is OK too.

I can't find anything anywhere!

+1  A: 

I don't think you will be able to find a single IFilter that will be able to access the contents of all of those types. Typically, an IFilter will be for a specific technology.

For example, Adobe have one for PDFs, Microsoft provide one for Office that can do Word, Excel, Powerpoint, CSV (that I believe comes pre-installed with Windows).

adrianbanks
Another alternative for PDF text indexing is FoxIt Software. http://www.foxitsoftware.com I've found their PDF IFilter much more reliable than Adobe's.
dthrasher
A: 

Nooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo. Well, that's what I suspected. Thanks very much though, it confirms what I had feared.

Better start coding...

ben