tags:

views:

1318

answers:

3

Hi,

for some monitoringproject I want to be able to read line by line from a pdf, compare it to a string( a filename), and if the string appears in that line, write that line to a list.

So far I had a quick look at ITextSharp and at PDFSharp, but it doesn't seem like these are the right tools for the job as they focus most on altering and printing pdfs.

Does anyone know another way of reading lines from a pdf, or should I keep trying with ITextSharp & PDFSharp?

thx.

A: 

As you know (I suppose) Pdf is not a text file format. There are many tools you can use to extract text.
Two example:
- Xpdf.PdftoText (www.foolabs.com/xpdf/) Free - Exe command line
- Pdflib.Tet (www.pdflib.com) $$$ - library (net, java, com, ...)

Fabrizio
+2  A: 

I use PDFBox with Lucene. It was easy to find out how it works and it does the job. It's opensource and free.

Paco
A: 

Thx, PDFBox did the trick very quick indeed.