views:

290

answers:

3

Given a PDF, how can one get the layout mode of a PDF (or relative width/height) using a PHP lib or linux command line tool?

Using http://www.tecnick.com/public/code/cp%5Fdpage.php?aiocp%5Fdp=tcpdf which can set this variable on new PDFs, but for existing pdfs from adobe.

Thought of converting pdfs to ps, or using gs in some other way - like converting it to an image first, and getting the width and height of that. Is this the best way?

+1  A: 

Big gun, but no other suggestions. I have used the iText Java library for processing pdf files.

Note that as far as I know there is no such thing as PDF layout mode, or size. The PDF is a collection of pages each of which has a media box defining the size of the page to be printed. However this property can be inherited by a page from previous pages if not defined. See PDF reference for details.

Zed
iText java library looks like the equivalent of tcpdf. Thanks for the tip about PDFs though
Jonathan Hendler
+1  A: 

The solution I'm using is to use ghostscript to print the first page to an image, then getting the image dimensions

$cmd = 'gs -dSAFER -dBATCH -dNOPAUSE -dFirstPage=1 -dLastPage=1 -sDEVICE=png16m -r400 -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -sOutputFile="'.$complete_file_path.'/p%d.png" "'.$complete_file_path.'/'.$this->pdffilename.'"';
        $result = $this->proc( $cmd );
        list($width, $height, $type, $attr) = getimagesize($complete_file_path.'/'.$pngfilename);
Jonathan Hendler
@Jonathan: This will miss all cases where the PDF contains pages of mixed size and mixed orientation. (And believe me, there are quite a few of those out there in the wild...)
pipitas
A: 

You cannot always rely on the results from the first page to be the same for all the rest. I've seen enough mixed format PDFs out there in the wild to not want to base any code on that assumption.

A more reliable way to determine the media sizes for each page (and even each one of the embedded {Trim,Media,Crop,Bleed}Boxes) is the commandline tool pdfinfo.exe (part of the XPDF tools from http://www.foolabs.com/xpdf/download.html). You can run the tool with the "-box" parameter and tell it with "-f 3" to start at page 3 and with "-l 8" to stop processing at page 8.

Example output:

C:\downloads>pdfinfo -box -f 1 -l 3 _IXUS_850IS_ADVCUG_EN.pdf
Creator:        FrameMaker 6.0
Producer:       Acrobat Distiller 5.0.5 (Windows)
CreationDate:   08/17/06 16:43:06
ModDate:        08/22/06 12:20:24
Tagged:         no
Pages:          146
Encrypted:      no
Page    1 size: 419.535 x 297.644 pts
Page    2 size: 297.646 x 419.524 pts
Page    3 size: 297.646 x 419.524 pts
Page    1 MediaBox:     0.00     0.00   595.00   842.00
Page    1 CropBox:     87.25   430.36   506.79   728.00
Page    1 BleedBox:    87.25   430.36   506.79   728.00
Page    1 TrimBox:     87.25   430.36   506.79   728.00
Page    1 ArtBox:      87.25   430.36   506.79   728.00
Page    2 MediaBox:     0.00     0.00   595.00   842.00
Page    2 CropBox:    148.17   210.76   445.81   630.28
Page    2 BleedBox:   148.17   210.76   445.81   630.28
Page    2 TrimBox:    148.17   210.76   445.81   630.28
Page    2 ArtBox:     148.17   210.76   445.81   630.28
Page    3 MediaBox:     0.00     0.00   595.00   842.00
Page    3 CropBox:    148.17   210.76   445.81   630.28
Page    3 BleedBox:   148.17   210.76   445.81   630.28
Page    3 TrimBox:    148.17   210.76   445.81   630.28
Page    3 ArtBox:     148.17   210.76   445.81   630.28
File size:      6888764 bytes
Optimized:      yes
PDF version:    1.4
pipitas