tags:

views:

113

answers:

4

I need to find a certain key in a pdf file. As far as I know the only way to do that is to interpret a pdf as txt file. I want to do this in PHP without installing a addon/framework/etc.

Thanks

+1  A: 

I found this function, hope it helps.

http://community.livejournal.com/php/295413.html

Gazler
also very helpful...thanks :)
Kel
+4  A: 

You can certainly open a PDF file as text. PDF file format is actually a collection of objects. There is a header in the first line that tells you the version. You would then go to the bottom to find the offset to the start of the xref table that tells where all the objects are located. The contents of individual objects in the file, like graphics, are often binary and compressed. The 1.7 specification can be found here.

Tom Cabanski
wow, thank you very much for your input. Do you have, by any chance, a documentation about PDF and PHP relations?
Kel
A: 

You can't just open the file as it is a binary dump of objects used to create the PDF display, including encoding, fonts, text, images. I wrote an blog post explaining how text is stored at http://pdf.jpedal.org/java-pdf-blog/bid/27187/Understanding-the-PDF-file-format-text-streams

mark stephens
arigato gozaimasu!
Kel
A: 

Thank you all for your help. I owe you this piece of code:

// Proceed if file exists
if(file_exists($sourcePath)){
    $pdfFile = fopen($sourcePath,"rb");
    $data = fread($pdfFile, filesize($sourcePath));
    fclose($pdfFile);

    // Check if file is encrypted or not
    if(stripos($data,$searchFor)){ // $searchFor = "/Encrypt"
        $counterEncrypted++;    
    }else{
        $counterNotEncrpyted++;
    }
}else{
    $counterNotExisting++;
}
Kel