tags:

views:

6311

answers:

9

I need a Java library to convert PDFs to TIFF images. The PDFs are faxes, and I will be converting to TIFF so that I can then do barcode recognition on the image. Can anyone recommend a good library (free or $$) for conversion from PDF to TIFF?

+3  A: 

I can't recommend any code library, but it's easy to use GhostScript to convert PDF into bitmap formats. I've personally used the script below (which also uses the netpbm utilties) to convert the first page of a PDF into a JPEG thumbnail:

#!/bin/sh

/opt/local/bin/gs -q -dLastPage=1 -dNOPAUSE -dBATCH -dSAFER -r300 \
    -sDEVICE=pnmraw -sOutputFile=- $* |
    pnmcrop |
    pnmscale -width 240 |
    cjpeg

You can use -sDEVICE=tiff... to get direct TIFF output in various TIFF sub-formats from GhostScript.

Alnitak
I have used a ghostscript solution before, but it is simply too slow for the volume I need to handle.
RedFilter
A: 

Maybe it is not neccessary to convert the PDF into TIFF. The fax will most likely be an embedded image in the PDF, so you could just extract these images again. That should be possible with the already mentioned iText library.

I don't know if this is easier than the other approach.

+1  A: 

You might be interested in this thread.

laz
A: 

No Itext can not convert PDFs to Tiff.

However, there are commercial libraries that can do that. jPDFImages is a 100% java library that can convert PDF to images in TIFF, JPEG or PNG formats (and maybe JBIG? I am not sure). It can also do the reverse, create PDF from images. It starts at $300 for a server.

+3  A: 

Disclaimer: I work for Atalasoft

We have an SDK that can convert PDF to TIFF. The rendering is powered by Foxit software which makes a very powerful and efficient PDF renderer.

Lou Franco
A: 

iText won't work!

A: 

Take a llok at: http://pdfbox.apache.org/

gusti
+2  A: 

we here also doing conversion PDF -> G3 tiffs with high and low res. From my experience the best tool you can have is Adobe PDF SDK, the only problem with it is its insane price. So we don't use it.

what works fine for us is ghostscript, last versions are pretty much robust and do render correctly majority of the pdfs. And we have quite a few of them coming during the day. In production conversion is done using the gsdll32.dll; but if you want to try it use the following command line:

gswin32c -dNOPAUSE -dBATCH -dMaxStripSize=8192 -sDEVICE=tiffg3 -r204x196 -dDITHERPPI=200 -sOutputFile=test.tif prefix.ps test.pdf

it would convert your PDF into the high res G3 TIFF. and prefix.ps code is here:

<< currentpagedevice /InputAttributes get
0 1 2 index length 1 sub {1 index exch undef } for
/InputAttributes exch dup 0 <</PageSize [0 0 612 1728]>> put
/Policies << /PageSize 3 >> >> setpagedevice

another thing about this sdk is that it's open source; you're getting both c and ps (postscript) source code for it. Also if you're going with another tool check what kind of an engine they have to power the pdf rendering, it could happen they are using gs for it; like for instance LeadTools does.

hope this helps, regards

serge_gubenko
A: 

You can use "PDF To Image Converter" to convert pdf files to CCITT G3 tiff files, and this tool supports command line, so you can call it from your application, you can get more information from http://www.convertzone.com/all/go-pdf%20to%20image%20converter-1-1.htm

regards
flyaga

flyaga