views:

1176

answers:

7

I have a series of PDFs named sequentially like so:

  • 01_foo.pdf
  • 02_bar.pdf
  • 03_baz.pdf
  • etc.

Using Ruby, is it possible to combine these into one big PDF while keeping them in sequence? I don't mind installing any necessary gems to do the job.

If this isn't possible in Ruby, how about another language? No commercial components, if possible.


Update: Jason Navarrete's suggestion lead to the perfect solution:

Place the PDF files needing to be combined in a directory along with pdftk (or make sure pdftk is in your PATH), then run the following script:

pdfs = Dir["[0-9][0-9]_*"].sort.join(" ")
`pdftk #{pdfs} output combined.pdf`

Or I could even do it as a one-liner from the command-line:

ruby -e '`pdftk #{Dir["[0-9][0-9]_*"].sort.join(" ")} output combined.pdf`'

Great suggestion Jason, perfect solution, thanks. Give him an up-vote people.

A: 

I don't think Ruby has tools for that. You might check ImageMagick and Cairo. ImageMagick can be used for binding multiple pictures/documents together, but I'm not sure about the PDF case.

Then again, there are surely Windows tools (commercial) to do this kind of thing.

I use Cairo myself for generating PDF's. If the PDF's are coming from you, maybe that would be a solution (it does support multiple pages). Good luck!

akauppi
Thank you for the suggestions. We do indeed use a variety of tools for creating and even combining PDFs. However, one can't easily automate them, especially the tool that does the combining, hence the desire to script it in Ruby (or another language).
Charles Roper
+2  A: 

You can do this by converting to PostScript and back. PostScript files can be concatenated trivially. For example, here's a Bash script that uses the Ghostscript tools ps2pdf and pdf2ps:

#!/bin/bash
for file in 01_foo.pdf 02_bar.pdf 03_baz.pdf; do
    pdf2ps $file - >> temp.ps
done

ps2pdf temp.ps output.pdf
rm temp.ps

I'm not familiar with Ruby, but there's almost certainly some function (might be called system() (just a guess)) that will invoke a given command line.

Adam Rosenfield
A: 

I'd suggest looking at the code for PDFCreator (VB, if I'm not mistaken, but that shouldn't matter since you'd just be implementing similar code in another language), which uses GhostScript (GNU license). Or just dig straight into GhostScript itself; there's also a facade layer available called GhostPDF, which may do what you want.

If you can control GhostScript with VB, you can do it with C, which means you can do it with Ruby.

Ruby also has IO.popen, which allows you to call out to external programs that can do this.

JasonTrue
+11  A: 

A Ruby-Talk post suggests using the pdftk toolkit to merge the PDFs.

It should be relatively straightforward to call pdftk as an external process and have it handle the merging. PDF::Writer may be overkill because all you're looking to accomplish is a simple append.

Jason Navarrete
Thanks, I had never come across pdftk before. What a great tool!
Charles Roper
A: 

Any Ruby code to do this in a real application is probably going to be painfully slow. I would try and hunt down unix tools to do the job. This is one of the beauties of using Mac OS X, it has very fast PDF capabilities built-in. The next best thing is probably a unix tool.

Actually, I've had some success with rtex. If you look here you'll find some information about it. It is much faster than any Ruby library that I've used and I'm pretty sure latex has a function to bring in PDF data from other sources.

Dan Harper - Leopard CRM
+1  A: 

If you have ghostscript on your platform, shell out and execute this command:

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdf <your source pdf files>

Steve Hanov
The -sPAPERSIZE option for gs is a useful one to know about. For example -sPAPERSIZE=a4 or -sPAPERSIZE=letter.
Charles Roper
+2  A: 

I tried the pdftk solution and had problems on both SnowLeopard and Tiger. Installing on Tiger actually wreaked havoc on my system and left me unable to run script/server, fortunately it’s a machine retired from web development.

Subsequently found another option: - joinPDF. Was an absolutely painless and fast install and it works perfectly.

Also tried GhostScript and it failed miserably (could not read the fonts and I ended up with PDFs that had images only).

But if you’re looking for a solution to this problem, you might want to try joinPDF.

Gordon Isnor