tags:

views:

364

answers:

6

Does anyone know a good library for PDF rendering for Java? Ideally, it should support not only displaying the image but also retrieving the text from it, finding which text is at a certain location, etc.

+2  A: 

This question gets asked a lot. There's still nothing better than iText for creating PDFs. Rendering PDFs is a trickier prospect. Maybe start with pdfrenderer. I've used it before for printing PDFs directly from Java with good results. It seems to offer a nice display option too.

The text part is trickier, since PDF doesn't hold its text information in the way you might think, since its designed for displaying and printing as opposed to a more "word processing"-centric approach.

There's a brilliant book accompanying iText called iText In Action which is full of good examples of how to do things with the library. I'd maybe start there to find out if it can do exactly what you want.

banjollity
it does render a pdf document?
dfa
To be honest, I misread the question. I've edited my response now.
banjollity
+1  A: 

You could have a look at Apache FOP. You will have to learn XSL-FO, but it's so much easier to get the layout right. Working with iText can be a pain.

Kimble
Does it really work? I know about PDF creation, but how can I _display_ a PDF file using Apache FOP?
DR
A: 

There are several PDF renderers under an LGPL license. As well as PDFRenderer there is IcePdf and JPedal.

A: 

I added text parsing to iText late last year. The iText text parser is more than capable of giving coordinates for found text. However, iText won't render the PDF on-screen, so this may or may not be useful for your needs. Experience with pdfrenderer is that it's ok, but kind of slow, and doesn't handle the full scope of all PDFs that are out there.

Kevin Day
+1  A: 

pdfrenderer doesn't parse documents generated by Acrobat 9, use IcePDF.

th3byrdm4n
A: 

I've looked at a few of these.

iText is for generating PDFs.

For reading PDFs, you need one of the following:

open source pdf-renderer from Sun is an older and unsupported library and is not good at handling complex embeds and True Type Fonts.

open source pdfBox which appears to be from some of the Apache Fop team is currently only slightly better than pdf-renderer (sorry guys).

The two professional versions are

JPedal, which is not free to use, but is excellent.

IceBox, which has been released to open source, but you need to pay for a commercial license.

Both of the above appear to be excellent.

Flarkar Mildijamm