tags:

views:

701

answers:

4

Hi,

I want to read the contents of following file types using C#:

  1. RTF
  2. PDF
  3. HTML
  4. MS Word

Is there any common API in .Net for reading all file type contents?

+2  A: 

There is no built in support for reading most of those file types. HTML is plain text so you can use the System.IO/StreamReader to read it, but you must parse it yourself.

There are third party components which will read these file types, but I am not sure if there is one all encompassing component.

For PDFs, I believe iTextSharp allows you to read.

For RTF/Word, You can use the Primary Interop Assemblies

Bob
A: 

If you are going to full-text index the data, look into using Lucene, it can handle those file types.

RedFilter
A: 

I've used Aspose before it's a very powerful product it's reasonably pricey so would only recommend it if your application also needs to create new word/pdf/rtf documents.

I agree with the other comments about just using System.IO for reading HTML files.

Nick Josevski
A: 

has anyone used PdfBox to create pdf files ?

Steve Chapman