I need a way to count the number of pages of a PDF in PHP. I've done a bit of Googling and the only things I've found either utilize shell/bash scripts, perl, or other languages, but I need something in native PHP. Are there any libraries or examples of how to do this?
Hi
You could try fpdi (see here), as you can see when setting the sourcefile you get back the page numbers,
regards, Lothar
You can use the ImageMagick extension for PHP. ImageMagick understands PDF's, and you can use the identify
command to extract the number of pages. The PHP function is Imagick::identifyImage().
Try this :
<?php
if (!$fp = @fopen($_REQUEST['file'],"r")) {
echo 'failed opening file '.$_REQUEST['file'];
}
else {
$max=0;
while(!feof($fp)) {
$line = fgets($fp,255);
if (preg_match('/\/Count [0-9]+/', $line, $matches)){
preg_match('/[0-9]+/',$matches[0], $matches2);
if ($max<$matches2[0]) $max=$matches2[0];
}
}
fclose($fp);
echo 'There '.($max<2?'is ':'are ').$max.' page'.($max<2?'':'s').' in '. $_REQUEST['file'].'.';
}
?>
The Count tag shows the number of pages in the different nodes. The parent node has the sum of the others in its Count tag, so this script just looks for the max (that is the number of pages).
That script works for me most of the time, but I do have some PDFs without Page/Count tags. I didn't create the original PDF, but I have tried recreating it, by printing to PDF from Acrobat, but still it does not contain the tags. Anyone have any ideas on how to add these properly. I tried some quick manual additions of it, but it broke the PDF.
I actually went with a combined approach. Since I have exec disabled on my server I wanted to stick with a PHP based solution, so ended up with this:
Code:
function getNumPagesPdf($filepath){
$fp = @fopen(preg_replace("/\[(.*?)\]/i", "",$filepath),"r");
$max=0;
while(!feof($fp)) {
$line = fgets($fp,255);
if (preg_match('/\/Count [0-9]+/', $line, $matches)){
preg_match('/[0-9]+/',$matches[0], $matches2);
if ($max<$matches2[0]) $max=$matches2[0];
}
}
fclose($fp);
if($max==0){
$im = new imagick($filepath);
$max=$im->getNumberImages();
}
return $max;
}
If it can't figure things out because there are no Count tags, then it uses the imagick php extension. The reason I do a two-fold approach is because the latter is quite slow.