views:

152

answers:

0

Hello.

I have a problem in my image extractor code. Any suggestions are appreciated.

I first get a stream from the PDF and check each of the objects, if an object is an image I save it. The code is working properly and I can get all the images in the PDF. Some images are unusual because part of the image missing (hidden/erased) and image file size are small (40kb).

I am Persian. Please excuse my bad english. Thank you.

The code :

PdfReader reader = new PdfReader(pdf); 
{
    for (int i = 0; i < reader.XrefSize; i++) 
    {
        PdfObject pdfobj = reader.GetPdfObject(i); 
        if (pdfobj != null) 
        {
            if (pdfobj.IsStream()) 
            {
                PdfStream stream = (PdfStream)pdfobj; 
                PdfObject pdfsubtype = stream.Get(PdfName.SUBTYPE); 
                if (pdfsubtype != null) 
                {
                    // PDF Subtype OK 
                    if (pdfsubtype.ToString().Equals(PdfName.IMAGE.ToString())) 
                    {
                        //image found 
                        byte[] img = PdfReader.GetStreamBytesRaw((PRStream)stream);
                        if (img.Length >= 5 * 1024) 
                        {
                            FileStream outp = File.Create(output +"\\img" + i.ToString() + ".bmp"); 
                            outp.Write(img, 0, img.Length);
                            outp.Flush();
                            outp.Close();
                        }
                    }
                }
            }
        }
    }
}