views:

37

answers:

2

As briefly discussed in this post and in a comment by the same author in this post, it seems as though as of PDF version 1.5 (circa 2003), Adobe is encoding the file format in their native applications (distiller, acrobat, etc.), which renders most open-source libraries unable to parse those files. PDF's generated by open-source libraries or non-Adobe commercial software appear to be unaffected. (this is as I gather, please correct me if I'm wrong)

However, for some reason my searches don't turn up any results of developers complaining about this issue. This leads me to believe that the vast majority of PDF's online today are not generated by Adobe software.

My question is:

  • How much of the internet's PDF's are actually generated by Adobe software, and how much by open source software?
  • I haven't been able to find anything about this issue online. Is there a reason it seems no open-source libraries have started supporting the change? Am I missing something? Why would Adobe do this to us :(
+2  A: 

You are making assumptions based on an incorrect comment. Adobe has changed the PDF file format over time to add features and some of these changes have caused problems with older PDF viewers. The file format and changes are documented, and version 1.7 of the PDF format is an ISO Standard. There is nothing to prevent an open source library from viewing, parsing, or generating newer versions of the PDF standard.

Craig Lebakken
Perfect! I asked and was educated - thanks! (My first post on SO)
SampleJACK
+1  A: 

Adobe introduced a new feature of compressed objects which caused issues for some libraries like Suns PDFRenderer. I wrote a blog article explaining what compressed objects are at http://www.jpedal.org/PDFblog/?p=515 . Most still developed Open Source libraries support this. Are you thinking of a particular library or feature?

mark stephens
Hey, awesome blog post, thanks! I'm using PDFTK http://www.pdflabs.com/docs/pdftk-man-page/ and I'm not sure if the PDF's I tested with have the compression, but I've observed PDF's version 1.5 and 1.6 that break the library. I'm specifically interested functionalities like concatenating specific page ranges of multiple PDF's and watermarks, rather than reading or viewing. (My environment is LAMP)
SampleJACK
Did you ask the pdftk developers about support for object streams?
mark stephens
Yes, I will update if/when I receive a response, thx mark
SampleJACK