views:

100

answers:

4

I keep finding myself wanting to manipulate PDF files in various odd ways. I asked some time ago about open PDF tools here; now I find I want to have a much deeper understanding of the spec itself.

Does anyone know of any good books or articles explaining the details of the PDF standard itself? Other than the standard — reading standards is kind of a pain, I'd like to find something that's a little more organized for a new reader.

+2  A: 

There are a number of places you can look, and it really depends on what you are looking for. Overall if you really want to know "how it works" the standard is what you are looking for. If you are looking for a higher level than that, some of these might be helpful.

Mitchel Sellers
I started with Adobe's pdf reference as well to start making PDF's from scratch.
Stijn Sanders
A: 

If you want to manipulate the contents of a PDF file the standard is the thing you need. For a standard the PDF one is rather readable. I found using it in combination with the (well structured) source code to a simple PDF generator to be ok for understanding how PDF works. But I already have a background with postscript, fonts, (La)TeX, enough mathematics, and a few dozen programming languages. iText is not a simple PDF generator in this context, b.t.w.

Stephan Eggermont
A: 

The PDF standards are readable; but they assume some knowledge of PostScript. Also, they're so massive that it's really hard to grasp the current versions. Much easier is to start with an old spec and go from there. PDF1.3 (Acrobat 4) is still supported and the basis of PDF/X

That said, the best 'high level' description I've read is the intro to The PDFTeX User Manual. It's just a couple of pages, definitely recommended reading.

After that, remember that the format was designed to be read by a PostScript interpreter, everything is defined in terms of PS arrays, dictionaries and streams, usually compressed and encoded with the usual PS filters. To make any sense of it you'll first need a library that can parse all these objects.

Javier
+1  A: 

If you have Adobe Acrobat 9.0 there is a really useful tool hidden inside, allowing you to explore the PDF internals and see what is going on inside. I wrote a blog article explaining how to use it at http://pdf.jpedal.org/java-pdf-blog/bid/10479/Viewing-PDF-objects

If you want a slightly less polished Open Source alternative there is a similar function built into Itext.

There are also lots of Open Source tools written in Java, C, and other languages for PDF which you could examine to see how they work. There are also several PDF forums where people discuss the PDF internals (ie planetpdf).

Unfortunately the PDF spec is large and detailed - I have my head buried in it since 1997 and I am still learning things....