I need to find a certain key in a pdf file. As far as I know the only way to do that is to interpret a pdf as txt file. I want to do this in PHP without installing a addon/framework/etc.
Thanks
I need to find a certain key in a pdf file. As far as I know the only way to do that is to interpret a pdf as txt file. I want to do this in PHP without installing a addon/framework/etc.
Thanks
You can certainly open a PDF file as text. PDF file format is actually a collection of objects. There is a header in the first line that tells you the version. You would then go to the bottom to find the offset to the start of the xref table that tells where all the objects are located. The contents of individual objects in the file, like graphics, are often binary and compressed. The 1.7 specification can be found here.
You can't just open the file as it is a binary dump of objects used to create the PDF display, including encoding, fonts, text, images. I wrote an blog post explaining how text is stored at http://pdf.jpedal.org/java-pdf-blog/bid/27187/Understanding-the-PDF-file-format-text-streams
Thank you all for your help. I owe you this piece of code:
// Proceed if file exists
if(file_exists($sourcePath)){
$pdfFile = fopen($sourcePath,"rb");
$data = fread($pdfFile, filesize($sourcePath));
fclose($pdfFile);
// Check if file is encrypted or not
if(stripos($data,$searchFor)){ // $searchFor = "/Encrypt"
$counterEncrypted++;
}else{
$counterNotEncrpyted++;
}
}else{
$counterNotExisting++;
}