tags:

views:

589

answers:

10

We've got a .net 2.0 web system that dynamically builds pdf files. Some of these files can get pretty large - 12MB+. While processing time isn't a factor, really, the size of the files to be downloaded is in some cases.

For the moment, let's assume that our B-grade pdf library is already making the smallest files that it knows how. (Although, if anyone has any suggestions on that front, do see this related question.)

However, taking the 12MB file in question and sending it though the Acrobat distiller results in a roughly 700K file, with no appreciable loss in print quality.

I'd love to have some kind of post-processor that does even a third of that. Does anyone have any controls they know about that'll do something like this?

The cheaper the better, for this project, but we're not adverse to throwing a few bucks down.

(Some preemptive comments: naturally, rewriting the existing PDF generation code with a new tool is off the table at the moment. Also, while Distiller seems to have an API, calling that on a webserver doesn't seem like the most efficient course - and Distiller is a little pricey. Finally, we'd just as well not wrap the pdfs in a zip file or some such, since that may baffle the clients somewhat. No, really.)

Thanks!

A: 

File a bug with the maker of your pdf library? If it's open source, fix a couple of the low hanging fruit (there are probably many) and submit a patch?

Ron
Lord, I wish it was open source. This is one of many, many issues I'd have dug into by now. For what it's worth, I don't think it's a bug so much as - different opinions about what are important features. (Also, skimpy documentation.)
Electrons_Ahoy
I had an idea -- can you enable mod_gzip or mod_deflate on your webserver for serving the pdf files? The header would look like:Content-Type: blah blah pdfContent-Encoding: gzip
Ron
A properly built PDF file has most of its content compressed, so it is unlikely that another layer of compression will help much. An image-laden file will be helped the least.
RBerteig
That's right, but in the question he alludes that zipping the pdf may confer size advantages at the expense of bafflement.
Ron
A: 

I don't have a specific answer to your question, so I hope that my response is not poor form.

I've used pdftk for a variety of PDF-related tasks. It's easy to use from the shell and I see that it does have a compression feature. You could try it out quickly to see if it's something that would work for post processing for your application.

Boden
+1  A: 

PDF's usually use JBIG/JBIG2/JPEG2000 compression. Cvision's PDFCompressor is the best for compressing PDF's.

StingyJack
A: 

If you're interested in lossless compression, try my tool Precomp and a file compressor of your choice. Depending on what contents are in your PDF file, Precomp usually enlarges your PDF file so it can be compressed much better afterwards.

schnaader
+1  A: 

There are multiple flavors of PDF with different size functionality trade-offs. If you are converting text-based documents (word/excel/etc) versus image documents (TIFF/JPG/BMP/etc) then it would probably explain the smaller file sizes that distiller gives you. You need to make sure your utility is not just creating Image-only PDF files (which a typically much bigger) out of everything. Also the compression format is very important ESPECIALLY for color documents. Look for configuration options that allow you to tweak those settings. If you mention the specific PDF builder tool we might be able to give you more specific help on that.

Here is a decent reference on the "flavors" of PDF files:

JohnFx
+11  A: 

Use Ghostscript, which is also available for the 32bit and 64bit Windows platforms. It recognises all Adobe Distiller parameters[1] and honors most of them. On top of that, you can inject PostScript programs into the conversion process. I use it for a year now in a pre-print production environment on image-heavy PDFs. If the parameters are set correct, the file-size can go from 40MB down to 800kB with no visible loss of quality. I found it to be quite fast, in fact the documentation states that it may be faster than Adobe Distiller.

And it is free (as in beer as well as in speech).

[1] See distparm.pdf in the help folder of Distiller or look here.

How you use it

You call it from the command line with all your wanted parameters, input and output-files and you're done.

Quick example:

gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite\
   -dCompatibilityLevel=1.3 -dEncodeColorImages=true\
   -sOutputFile=output.pdf input.pdf

Some valuable resources:

pi
We use it where I work as well, we use bullzip as a virtual printer (which in turn uses ghostscript) to print all of our documents (which are custom .Net PrintDocument objects).
SnOrfus
A: 

Aside from using another library, your best bet is to get your library working right. Some suggestions on your other post - I'm not sure of any 'post process' that you would want to run to compress down the file.

As an aside, does your webserver allow HTTP gzipped content? Transparent to the end user!

(That being said, short PDF files should be pretty impervious to most compression methods - images should be compressed during rendering (and JPEG >> ZIP in this case) - but if you have a lot of text, gzip can help)

Rizwan Kassim
A: 

Don't include entire fonts in the PDF. Taking care of that one can save a few megabytes.

John Nilsson
A: 

If your pdf library is making sub-optimal PDFs then loading and saving the PDF in any other library ought to give you smaller files. PDFNet SDK Type 3 should be up to this task and at 360USD is cheaper than Adobe PDF library.

danio
xpdf doesn't support writing PDF files so it couldn't be used to shrink a PDF file.
Dwight Kelly
@Dwight Kelly - I hadn't appreciated that limitation - have corrected my answer.
danio
+1  A: 

Apago have lots of tools for 'tidying up' PDFs

http://www.apagoinc.com/