views:

543

answers:

3

Hi there,

I have a pdf that was generated from scanning software. The pdf has 1 TIFF image per page. I want to extract the TIFF image from each page. I am using iTextSharp and I have successfully found the images and can get back the raw bytes from the PdfReader.GetStreamBytesRaw method. The problem is, as many before me have discovered, iTextSharp does not contain a PdfReader.CCITTFaxDecode method.

What else do I know? Even without iTextSharp I can open the pdf in notepad and find the streams with /Filter /CCITTFaxDecode and I know from the /DecodeParams that it is using CCITTFaxDecode group 4.

Does anyone out there know how I can get the CCITTFaxDecode filter images out of my pdf?

Cheers, Kahu

A: 

Perhaps you can try to uncompress the pdf with pdftk? The syntax is

pdftk infile.pdf output uncompressed.pdf uncompress

I don't have a CCITTFax encoded pdf here so I can't test it.

Patrick
+1  A: 

This library... http://www.bitmiracle.com/libtiff/ and this example below should get you 99% of the way there

string filter = pd.Get(PdfName.FILTER).ToString();
string width = pd.Get(PdfName.WIDTH).ToString();
string height = pd.Get(PdfName.HEIGHT).ToString();
string bpp = pd.Get(PdfName.BITSPERCOMPONENT).ToString();

switch (filter)
{
   case "/CCITTFaxDecode":

      byte[] data = PdfReader.GetStreamBytesRaw((PRStream)pdfStream);
      int tiff = TIFFOpen("example.tif", "w");
      TIFFSetField(tiff, (uint)BitMiracle.LibTiff.Classic.TiffTag.IMAGEWIDTH,(uint)Int32.Parse(width));
      TIFFSetField(tiff, (uint)BitMiracle.LibTiff.Classic.TiffTag.IMAGEHEIGHT, (uint)Int32.Parse(height));
      TIFFSetField(tiff, (uint)BitMiarcle.LibTiff.Classic.TiffTag.COMPRESSION, (uint)BitMiracle.Libtiff.Classic.Compression.CCITTFAX4);
      TIFFSetField(tiff, (uint)BitMiracle.LibTiff.Classic.TiffTag.BITSPERSAMPLE, (uint)Int32.Parse(bpp));
      TIFFSetField(tiff, (uint)BitMiarcle.Libtiff.Classic.TiffTag.SAMPLESPERPIXEL,1 );

      IntPtr pointer = Marshal.AllocHGlobal(data.length);
      Marshal.copy(data, 0, pointer, data.length);
      TIFFWriteRawStrip(tiff, 0, pointer, data.length);
      TIFFClose(tiff);

      break;




      break;

}
vbcrlfuser
+1  A: 

Actually, vbcrlfuser's answer did help me, but the code was not quite correct for the current version of BitMiracle.LibTiff.NET, as I could download it. In the current version, equivalent code looks like this:

using iTextSharp.text.pdf;
using BitMiracle.LibTiff.Classic;

...
      Tiff tiff = Tiff.Open("C:\\test.tif", "w");
      tiff.SetField(TiffTag.IMAGEWIDTH, UInt32.Parse(pd.Get(PdfName.WIDTH).ToString()));
      tiff.SetField(TiffTag.IMAGELENGTH, UInt32.Parse(pd.Get(PdfName.HEIGHT).ToString()));
      tiff.SetField(TiffTag.COMPRESSION, Compression.CCITTFAX4);
      tiff.SetField(TiffTag.BITSPERSAMPLE, UInt32.Parse(pd.Get(PdfName.BITSPERCOMPONENT).ToString()));
      tiff.SetField(TiffTag.SAMPLESPERPIXEL, 1);
      tiff.WriteRawStrip(0, raw, raw.Length);
      tiff.Close();

Using the above code, I finally got a valid Tiff file in C:\test.tif. Thank you, vbcrlfuser!

Berend Engelbrecht