views:

980

answers:

2

I have an input PDF file (usually, but not always generated by pdfTeX), which I want to convert to an output PDF, which is visually equivalent (no matter the resolution), it has the same metadata (Unicode text info, hyperlinks, outlines etc.), but the file size is as small as possible.

I know about the following methods:

  • java -cp Multivalent.jar tool.pdf.Compress input.pdf (from http://multivalent.sourceforge.net/). This recompresses all streams, removes unused objects, unifies equivalent objects, compresses whitespace, removes default values, compresses the cross-reference table.
  • Recompressing suitable images with jbig2 and PNGOUT.
  • Re-encoding Type1 fonts as CFF fonts.
  • Unifying equivalent images.
  • Unifying subsets of the same font to a bigger subset.
  • Remove fillable forms.
  • When distilling or otherwise converting (e.g. gs -sDEVICE=pdfwrite), make sure it doesn't degrade image quality, and doesn't increase (!) the image sizes.

I know about the following techniques, but they don't apply in my case, since I already have a PDF:

  • Use smaller and/or less fonts.
  • Use vector images instead bitmap images.

Do you have any other ideas how to optimize PDF?

+5  A: 

Optimize PDF Files

Avoid Refried Graphics

For graphics that must be inserted as bitmaps, prepare them for maximum compressibility and minimum dimensions. Use the best quality images that you can at the output resolution of the PDF. Inserting compressed JPEGs into PDFs and Distilling them may recompress JPEGs, which can create noticeable artifacts. Use black and white images and text instead of color images to allow the use of the newer JBIG2 standard that excels in monochromatic compression. Be sure to turn off thumbnails when saving PDFs for the Web. Use Vector Graphics

Use vector-based graphics wherever possible for images that would normally be made into GIFs. Vector images scale perfectly, look marvelous, and their mathematical formulas usually take up less space than bitmapped graphics that describe every pixel (although there are some cases where bitmap graphics are actually smaller than vector graphics). You can also compress vector image data using ZIP compression, which is built into the PDF format. Acrobat Reader version 5 and 6 also support the SVG standard. Minimize Fonts

How you use fonts, especially in smaller PDFs, can have a significant impact on file size. Minimize the number of fonts you use in your documents to minimize their impact on file size. Each additional fully embedded font can easily take 40K in file size, which is why most authors create "subsetted" fonts that only include the glyphs actually used. Flatten Fat Forms

Acrobat forms can take up a lot of space in your PDFs. New in Acrobat 8 Pro you can flatten form fields in the Advanced -> PDF Optimizer -> Discard Objects dialog. Flattening forms makes form fields unusable and form data is merged with the page. You can also use PDF Enhancer from Apago to reduce forms by 50% by removing information present in the file but never actually used. You can also combine a refried PDF with the old form pages to create a hybrid PDF in Acrobat (see "Refried PDF" section below).

see article

Adinochestva
+1 vote. Thanks for the link and the excerpt. Although it didn't help me to make my PDFs even smaller (all of the techniques it mentioned are in use or don't apply to my case), it gives a good insight.
pts
+1  A: 

From PDF specification version 1.5 there are two new methods of compression, object streams and cross reference streams.

You mention that the Multivalent.jar compress tool compresses the cross reference table. This usually means the cross reference table is converted into a stream and then compressed.

The format of this cross reference stream is not fixed. You can change the bit size of the three "columns" of data. It's also possible to pre-process the stream data using a predictor function which will improve the compression level of the data. If you look inside the PDF with a text editor you might be able to find the /Predictor entry in the cross reference stream dictionary to check whether the tool you're using is taking advantage of this feature.

Using a predictor on the compression might be handy for images too.

The second type of compression offered is the use of object streams.

Often in a PDF you have many similar objects. These can now be combined into a single object and then compressed. The documentation for the Multivalent Compress tool mentions that object streams are used but doesn't have many details on the actual choice of which objects to group together. The compression will be better if you group similar objects together into an object stream.

Bing