views:

133

answers:

1

I'm trying to dynamically generate PDFs from user input, where I basically print the user input and overlay it on an existing PDF that I did not create.

It works, with one major exception. Adobe Reader doesn't read it properly, on Windows or on Linux. QuickOffice on my phone doesn't read it either. So I thought I'd trace the path of me creating the files -

1 - Original PDF of background
PDF 1.2 made with Adobe Distiller with the LZW encoding. I didn't make this.

2 - PDF of background
PDF 1.4 made with Ghostscript. I used pdf2ps then ps2pdf on the above to strip LZW so that the reportlab and pyPDF libraries would recognize it. Note that this file looks "fuzzy," like a bad scan, in Adobe Reader, but looks fine in other readers.

3 - PDF of user-input text formatted to be combined with background
PDF 1.3 made with Reportlab from user input. Opens properly and looks good in every reader I've tried.

4 - Finished PDF
PDF 1.3 made from PyPDF's mergePage() function on 2 and 3.

Does not open in:
Adobe Reader for Windows
Adobe Reader for Linux
QuickOffice for Android

Opens perfectly in:
Google Docs' PDF viewer on the web
evince for linux
ghostscript viewer for linux Foxit reader for Windows
Preview for Mac

Are there known issues that I should know about? I don't know exactly what "flate" is, but from the internet I gather that it's some sort of open source alternative to LZW for PDF compression? Could that be causing my problem? If so, are there any libraries I could use to fix the cause in my code?

+2  A: 

First remark:

Your 2nd step has many, many drawbacks. If you convert PDF back to PostScript and then again back to PDF, you are going to loose quality. This process is called "re-frying PDFs", and is generally being frowned upon on the part of PDF professionals. (The reasons are: resulting files may look "fuzzy", like bad scans; files may have lost their embedded fonts; files may have replaced original fonts; files certainly have lost their transparencies; images have changed resolutions; colors have changed....)

Sometimes you have no other choice than "re-frying"... but here you DO.

If you use Ghostscript, you can do a direct PDF-to-PDF conversion of PDF files, and there will be no internal, hidden PostScript conversion happening. (This is a very rarely known feature of Ghostscript, and therefor this answer normall would deserve lots of upvotes ;-P ).

Since you do want to get rid of internal LZW compression, here is how to do it in Ghostscript:

  1. Download a little utility program, written in PostScript language, available from the Ghostscript source code repository: pdfinflt.ps
  2. Run the following commandline:

    gswin32c.exe -- [c:/path/to/]pdfinflt.ps input.pdf output.pdf

Update: Above commandline originally was wrong.
I had given it as gswin32c.exe -- [c:/path/to/]pdfinflt.ps output.pdf input.pdf.
That was the wrong order of input and output. My bad! Sorry about that.

The resulting PDF will have decompressed all its internal data streams, without loosing quality through your PDF ==> PS ==> PDF re-frying.

Second remark:

I think you should do your 4th step with a different tool, namely pdftk*. This has the advantage of saving you completely from going through steps 1. and 2. altogether.

pdfk (PDF ToolKit, download here) is a commandline utility, available on Linux, Unix (pdftk) and Windows (pdftk.exe), which can do a lot of things on PDFs, including overlaying the pages of two PDFs over each other. This is what I'd recommend you to use. pdftk can overlay the PDF from your step "3." to your original PDF (or vice versa) in one go without first needing to de-flate or de-LZW each one.

Here are commands for you to test:

pdftk.exe ^
  original.pdf ^
  background pdf-from-userinput-step3.pdf ^
  output merged.pdf

pdftk.exe ^
  pdf-from-userinput-step3.pdf ^
  background original.pdf ^
  output merged.pdf

pdftk.exe ^
  original.pdf ^
  stamp pdf-from-userinput-step3.pdf ^
  output merged.pdf

pdftk.exe ^
  pdf-from-userinput-step3.pdf ^
  stamp original.pdf ^
  output merged.pdf

You'll probably wonder about the difference between the stamp and background commands. The commands do what their name suggests: order the PDF page into the foreground or the background layer. Should both PDFs have transparent backgrounds (instead of solid white opaque), the result will in many cases be looking the same.

pipitas
I appreciate your help - but for some reason, neither tool works for me. The ps script returns `Error: /ioerror in --readstring--`Also, I tried pdftk before, and it also gives me errors when handling these pdfs. My workaround, sadly, was to ultimately create my own PDFs by laying down all the lines and text myself from scratch. I guess I only had to do it once, so there's that.
Shane
Hmm... that's probably because I've swapped the required order of input and output file. It should be: `gswin32c.exe -- [c:/path/to/]pdfinflt.ps input.pdf output.pdf`. I'll update the answer.
pipitas