views:

1697

answers:

3

Is there a good library for extracting text from a PDF? I'm willing to pay for it if I have to.

Something that works with C# or classic ASP (VBScript) would be ideal and I also need to be able to separate the pages from the PDF.

This question had some interesting stuff, especially pdftotext but I'd like to avoid calling to an external command-line app if I can.

A: 

Here is a good list: Open Source Libs for PDF/C#

Most of these are geared toward creating PDFs, but they should have read capability as well.

There is this one as well: iText

I have only played with iText before. Nothing major.

Doanair
+2  A: 

You can use the IFilter interface built into Windows to extract text and properties (author, title, etc.) from any supported file type. It's a COM interface so you would have use the .NET interop facilities.

You'd also have to download the free PDF IFilter driver from Adobe.

Ferruccio
A: 

We've used Aspose with good results.

Chuck