views:

180

answers:

3

I have a sequence of JPG images. Each of the scans is already cropped to the exact size of one page. They are sequential pages of a valuable and out of print book. The publishing application requires that these pages be submitted as a single PDF file.

I could take each of these images and just past them into a word-processor (e.g. OpenOffice) - unfortunately the problem here is that it's a very big book and I've got quite a few of these books to get through. It would obviously be time-consuming. This is volunteer work!

My second idea was to use LaTeX (actually pdflatex) - I could make a very simple document that consists of nothing more than a series of in-line image includes. I'm sure that this approach could be made to work, it's just a little on the complex side for something which seems like a very simple job.

It occurred to me that there must be a simpler way - so any suggestions?

I'm on Ubuntu 9.10, my primary programming language is Python, but if the solution is super-simple I'd happily adopt any technology that works.


UPDATE, can somebody explain what's going wrong here?

sal@bobnit:/media/NIKON D200/DCIM/100HPAIO/bat$ convert '*.jpg' bat.pdf
convert: unable to open image `*.jpg': No such file or directory @ blob.c/OpenBlob/2439.
convert: missing an image filename `bat.pdf' @ convert.c/ConvertImageCommand/2775.

Is there a way in the convert command syntax to specify that bat.pdf is the output?

Thanks

+11  A: 

It occurred to me that there must be a simpler way - so any suggestions?

You're right, there is! Try this:

sudo apt-get install imagemagick
cd ~/rare-book-images
convert "*.jpg" rare-book.pdf

Note: depending on what shell you're using "*.jpg" might not work as expected. Try omitting the quotes and seeing if that gets you the results you expect.

John Feminella
I would recommend trying it on a subset of the files first, just to make sure things look good for the first few pages. If you have a lot of pages, this will be an expensive operation.
John Feminella
you may want to use quotes (`'*.jpg'`) since imagemagick is smarter about getting things in the right order than the shell.
cobbal
@cobbal: That's not a bad idea, thanks.
John Feminella
That sounds like a great solution! I'm going to try it out now.Sal
Salim Fadhley
That is really freakin' simple. :)
jathanism
It does not seem to work as expected, see the update above.
Salim Fadhley
@Salim: Hmm, that's odd. What happens if you omit the quotes?
John Feminella
+1. But as well as I remember, convert may consume a lot of memory, if there are many pages. Probably a better solution in this case would be to convert each image separately (with convert or sam2p), and concatenate them together with pdftk.
jetxee
+7  A: 

If you're interested in a Python solution, you can use the ReportLab library. For example:

from reportlab.platypus import SimpleDocTemplate, Image
from reportlab.lib.pagesizes import letter
from glob import glob

doc = SimpleDocTemplate('image-collection.pdf', pagesize=letter)
parts = [Image(filename) for filename in glob('*.jpg')]
doc.build(parts)

This will take all the jpg files in your current directory and produce a file called "image-collection.pdf".

ars
A: 

I wonder if you could just do it with a for loop with a \includegraphics command inside and some suitably nifty standard image file naming and so on inside a LaTeX file. This might have the advantage of allowing title pages etc and page numbering and so on. (I'm not sure either of the other solutions do this and I can't be bothered to check. I'm just pondering out loud here, really)

Seamus