tags:

views:

1592

answers:

5

I need a way to count the number of pages of a PDF in PHP. I've done a bit of Googling and the only things I've found either utilize shell/bash scripts, perl, or other languages, but I need something in native PHP. Are there any libraries or examples of how to do this?

+3  A: 

Hi

You could try fpdi (see here), as you can see when setting the sourcefile you get back the page numbers,

regards, Lothar

lothar42
Unkwntech
I tried some pdf's with this but ImageMagick seems more reliable..With many pdf's I get:FPDF error: This document (test_1.pdf) probably uses a compression technique which is not supported by the free parser shipped with FPDI.
Chris
I have the same error message as @Chris with FPDI. Some of the PDFs have been generated with Adobe Pro 8/9.
neoneye
+2  A: 

You can use the ImageMagick extension for PHP. ImageMagick understands PDF's, and you can use the identify command to extract the number of pages. The PHP function is Imagick::identifyImage().

Travis Beale
A: 

Try this :

<?php
if (!$fp = @fopen($_REQUEST['file'],"r")) {
        echo 'failed opening file '.$_REQUEST['file'];
}
else {
        $max=0;
        while(!feof($fp)) {
                $line = fgets($fp,255);
                if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                        preg_match('/[0-9]+/',$matches[0], $matches2);
                        if ($max<$matches2[0]) $max=$matches2[0];
                }
        }
        fclose($fp);
echo 'There '.($max<2?'is ':'are ').$max.' page'.($max<2?'':'s').' in '. $_REQUEST['file'].'.';
}
?>

The Count tag shows the number of pages in the different nodes. The parent node has the sum of the others in its Count tag, so this script just looks for the max (that is the number of pages).

Baboum
A: 

That script works for me most of the time, but I do have some PDFs without Page/Count tags. I didn't create the original PDF, but I have tried recreating it, by printing to PDF from Acrobat, but still it does not contain the tags. Anyone have any ideas on how to add these properly. I tried some quick manual additions of it, but it broke the PDF.

adrianbj
+1  A: 

I actually went with a combined approach. Since I have exec disabled on my server I wanted to stick with a PHP based solution, so ended up with this:

Code:

function getNumPagesPdf($filepath){
    $fp = @fopen(preg_replace("/\[(.*?)\]/i", "",$filepath),"r");
    $max=0;
    while(!feof($fp)) {
            $line = fgets($fp,255);
            if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                    preg_match('/[0-9]+/',$matches[0], $matches2);
                    if ($max<$matches2[0]) $max=$matches2[0];
            }
    }
    fclose($fp);
    if($max==0){
        $im = new imagick($filepath);
        $max=$im->getNumberImages();
    }

    return $max;
}

If it can't figure things out because there are no Count tags, then it uses the imagick php extension. The reason I do a two-fold approach is because the latter is quite slow.

adrianbj