Hello.
I have a problem in my image extractor code. Any suggestions are appreciated.
I first get a stream from the PDF and check each of the objects, if an object is an image I save it. The code is working properly and I can get all the images in the PDF. Some images are unusual because part of the image missing (hidden/erased) and image file size are small (40kb).
I am Persian. Please excuse my bad english. Thank you.
The code :
PdfReader reader = new PdfReader(pdf);
{
for (int i = 0; i < reader.XrefSize; i++)
{
PdfObject pdfobj = reader.GetPdfObject(i);
if (pdfobj != null)
{
if (pdfobj.IsStream())
{
PdfStream stream = (PdfStream)pdfobj;
PdfObject pdfsubtype = stream.Get(PdfName.SUBTYPE);
if (pdfsubtype != null)
{
// PDF Subtype OK
if (pdfsubtype.ToString().Equals(PdfName.IMAGE.ToString()))
{
//image found
byte[] img = PdfReader.GetStreamBytesRaw((PRStream)stream);
if (img.Length >= 5 * 1024)
{
FileStream outp = File.Create(output +"\\img" + i.ToString() + ".bmp");
outp.Write(img, 0, img.Length);
outp.Flush();
outp.Close();
}
}
}
}
}
}
}