Looking for a PDF file parser.

+1 A:

Not sure if it supports the functionality you need, but we've been using abcPDF with some success.

Jeremy 2009-02-09 21:34:36

I don't think abcPDF supports parsing.

Richard Szalay 2009-02-09 21:41:40

@Richard Szalay, I wasn't sure. The feature matrix says it supports reading pdfs, but whether it goes you an object model in the api to accesss parts of the pdf is something I can't say for certain.

Jeremy 2009-02-09 21:54:08

I wouldn't go so far as to reject it's advertised feature set :) It didn't support it when I used it last, but it's writing capabilities certainly did the job well.

Richard Szalay 2009-02-09 22:32:12

ABCpdf does expose an object model, it's what they call Atoms.

Mark S. Rasmussen 2009-02-10 07:58:46

+3 A:

The PDF File Parser article on xactpro seems to be exactly what you need. It explains the format of the PDF and comes with full source code for a parser (and another project for visualisation of the model).

The parser uses format-specific terms, but you could easily use the visualiser to learn what to look for.

Richard Szalay 2009-02-09 21:48:55

+2 A:

You can also take a look at Xpdf (http://www.foolabs.com/xpdf/download.html)

Mihai Nita 2009-02-10 07:29:46

+1 A:

check out pdfbox

Abhijith 2009-12-01 07:33:11

ansaurus

tags:

views:

answers:

Looking for a PDF file parser.

related questions