views:

319

answers:

5

I'm writing my thesis in LaTeX, generating it with pdflatex. I have a large number of figures, many of which are bitmaps (as opposed to SVG) in PNG/JPEG format. I've generally created them to be fairly high resolution (say 1600x1200-ish) to ensure that whatever size they end up in the document, they'll be at least 300dpi when printed.

As I'm writing/laying out the document, I'm including graphics (using \includegraphics from the graphicx package) and setting widths/heights as appropriate (e.g. subfigures are quite small). I don't need the images to be any more than about 300 dpi at best, so where I have shrunk a 1600x1200 image down to say 5cm, the image is now at 800 dpi. So despite including some very small (on the page) images, the PDF is becoming quite large.

Is there a way to tell pdflatex or graphicx (or something else involved?) to convert all images to a maximum of 300 dpi, based on the dimensions I'm setting with say \includegraphics[width=2in]{filename}? i.e. so it scales the image to a max of 600x600 pixels as it includes it in the PDF (leaving the original file untouched).

I know I can resize the original images with various command line applications, and include the pre-resized versions, but given the images vary in size considerably, it wouldn't be as simple as making sure they're all 300dpi for a constant printed size. It'd also be nice to be able to easily create different versions of PDFs (web vs final print) without resizing images manually, so that the 'web' PDF capped images at say 72-100 dpi while the final print one could cap at 600 (if at all).

Update: I've found a poor solution to the problem in some cases (quite long so I've added it as an answer myself, see below). By reprocessing the PDF after creation, you can convert all images to JPEGs at a specific resolution (in dpi). This is good for docs with only JPEGs, but sucks if you have PNG (or similar non-lossy-compression) diagrams which are PNG for a reason (e.g. simple diagrams with hard edges).

So I'm still after a solution that can do this without forcing the same compression algorithm on all images (whether it's a LaTeX macro/package, or a script that parses the LaTeX and produces appropriately sized files, or a post-processor that doesn't take a one-compression-algorithm-fits-all approach).

+1  A: 

I've found one solution is to post-process the PDF, re-saving it in an application which will resize the images in the PDF to a maximum resolution.

I believe Adobe Acrobat (the Pro version, not the Reader) can do this (perhaps only on Windows? I'm not 100% sure, I don't have access to it).

On Mac OS X, I've found this post on the Apple Support forums which details a method of using Quartz filters and the Preview app (PDF/image viewer) to let you save a PDF with a reduced maximum image resolution for the whole document (75,150,300,600 dpi). The filter converts ALL images in the document to JPEG with the specified resolution (dpi) and specified quality (low/average).

So you can't leave certain images alone (e.g. diagrams in GIF/PNG form that are already small in file size, which may look awful as JPEGs), which would be the only real complaint using this method. There doesn't seem to be a way of preserving the original image format.

Set-up (required once only of course, instructions copied from the original post):

  1. get the filters from here in a .zip file
  2. unzip the archive, it'll create a 'Filters' folder
  3. move the 'Filters' folder into your ~/Library folder

After compiling a PDF, reduce unnecessarily high image resolutions:

  1. open the PDF in Preview.app
  2. select Save As... from the file menu
  3. Select the appropriate resolution in the 'Quartz Filter' drop-down (e.g. "Reduce to 300 dpi average quality")
drfrogsplat
+1  A: 

You may change resolution automatically using ImageMagick's convert -resample. Here is an awk script that parses the tex file, finds \includegraphics statements and then converts images embedded there to a required resolution:

(/\\includegraphics/&&(/{.*\.png}/||/{.*\.jpg}/)){
 if(match($0,"width=[0123456789\.]*in")==0){
    print "%Cannot convert, no width in inch!\n" $0;
 }else{
    wid=substr($0,RSTART+6,RLENGTH-8);  
    pixSize=wid*reso;  
    match($0,"{.*}");file=substr($0,RSTART+1,RLENGTH-2);  
    print "%convert " file " -resample " reso "x" reso " -resize " pixSize " RESAMPLED_" file;   
    system("convert " file " -resample " reso "x" reso " -resize " pixSize " RESAMPLED_" file);  
    gsub("{.*}","{RESAMPLED_" file "}",$0); print $0;
 }
}

!(/\\includegraphics/&&(/{.*\.png}/||/{.*\.jpg}/)){print $0}

you may use it like so (assuming it is in file resample.awk):

$ awk -f resample.awk -v reso=300 latexfile.tex > resampledlatexfile.tex
$ pdflatex resampledlatexfile.tex ...

Note that to work properly it needs \includegraphics to have width parameter specified, and specified in inch; if this is not satisfied, it will skip such picture.
It will create a series of resampled images (with RESAMPLED_ prefix), embedded in the resampledlatexfile.tex.

mbq
Thanks, this looks like a good solution, though I'm toying with finrod's for now as I like the idea of latex calling 'convert' rather than having a separate script
drfrogsplat
+2  A: 

I've tried a little black magic with LaTeX, and found out how to do it without external scripts ─ though it still is somewhat dirty and not that easy to use. Basically, you would need to call ImageMagick from inside LaTeX.

To do this, you need to call it with --shell-escape parameter. This allows for using a command named \write18, that executes a parameter passed to it. Then you could add the following code to the preamble:

\newcommand{\doConvert}[2]{\immediate\write18{convert #1 -resample %
  300x300 -resize #2 #1.rs}}
\newcommand{\imgRs}[4]{\doConvert{#1.#2}{#3x#4}%
  \includegraphics[width=#3, height=#4, type=#2, ext=.#2.rs, read=.#2.rs]{#1}%
}

This enables you to call \imgRs{file}{extension}{width_in_pts}{height_in_pts} in order to convert the image to 300 dpi and include it in the document.

Of course you could always specify the dimensions in inches or any other units: you would just have to change the call to \doConvert in imgRs to account for scaling from unit used to points -- and you would have to use the dimensions consistently. It could be possible to actually find out what were the units used, or make some parameters optional, but it's independent of this solution.

finrod
awesome, didn't realise i could call convert from within latex like this!this seems to work with some minor modifications (it's really not liking the .rs extension for some reason despite the ext/type/read args), but i've got it auto-resizing and including the resized images so I can go from there, thanks!I'll post some updated more generic code when I'm done messing it up (:
drfrogsplat
How about this? `\newcommand{\doConvert}[2]{\immediate\write18{mkdir -p downsampled; convert #1 -resample % 300x300 -resize #2 downsampled/#1}}\newcommand{\imgRs}[4]{\doConvert{#1.#2}{#3x#4}% \includegraphics[width=#3, height=#4, type=#2, ext=.#2.rs, read=.#2.rs]{downsampled/#1}%}`
Ken Bloom
+1  A: 

Try the degrade package, which performs the ImageMagick trick that others have written about.

Ken Bloom
Not good, since it does not touch pngs.
mbq
The `degrade.sh` script can probably be edited to not hard code the `.jpg` extension, though then you're rewriting anyway. Why not take degrade's idea of using a separate directory for the downsampled images (instead of changing the name), and hack that into finrod's solution? Add a `\write18{mkdir -p downsampled}` to the `\doConvert` command to ensure that the directory is created if it doesn't already exist.
Ken Bloom
+1  A: 

You can post-process the PDF and still use lossless compression. I have also had to learn a lot about manipulating PDF files for / from LaTeX when writing my thesis. Basically, you can use GhostScript and tell it what compression method you want for, say, color images (Win32 example, but you get the idea):

gswin32c.exe -sDEVICE=pdfwrite -dMaxSubsetPct=100 -dPDFSETTINGS=/prepress -dAutoFilterColorImages=false -dColorImageFilter=/FlateEncode -sOutputFile="outfile.pdf" -dNOPAUSE -dBATCH "infile.pdf"

You need to specify both -dAutoFilterColorImages=false (don't automatically choose the compression method) and -dColorImageFilter=/FlateEncode (use the Flate encoder).

You need to specify the corresponding parameters for Gray images as well if you want to control those.

Ghostscript resamples to a given dpi based on dPDFSETTINGS:

prepress = 300 dpi

printer = 300 dpi

ebook = 150 dpi

screen = 72 dpi

I have written a short tutorial on my website about manipulating and compressing PDF files, to explain this in more detail.

Peter Yu
Thanks for this info, its certainly an easier solution than having LaTeX macros to modify files on the fly, as it just filters the PDF for the required output (much like the Quartz Filter method for Mac).It also seems to suffer the same problem as the Quartz Filter method in that I can't tell how to tell it to make lossless PNG files stay as PNG and lossy JPEGs remain as JPEGs...I'm starting to wonder if the pdf creation actually maintains the original image files or whether some conversion takes place so they become indistinguishable when re-filtering it...
drfrogsplat
Also, one oddity I found was that using the 'printer' profile changed some background shading from a light yellow colour to a light blue colour, while this didn't happen when using the 'prepress' or 'ebook' profiles.
drfrogsplat
I have tried to find ways to make GhostScript use the compression method of the original image when recreating the file (so PNGs are left lossless, JPG are left as JPEG). I tried, as you suggested on my website, to leave out the -dColorImageFilter. When I do that, I see compression artifacts on my PNGs as well, at least in my version of GhostScript (GPL, 8.71). So for that version of GS at least, it uses a uniform method of compression for all images of a certain type (color, gray, mono).
Peter Yu