views:

135

answers:

3

Hi All,

I want to save/download pdfs from X website and then combined all those pdfs into one, so that it is easy for me to see all of them at once.

What I did,

  1. get pdfs from website

    wget -r -l1 -A.pdf --no-parent http://linktoX

  2. combine pdfs into one

    gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=Combined_date +%F.pdf -dBATCH file1.pdf file2.pdf file3.pdf

My question/problem is, I thought of automating whole this in one script, so that I dont have to do this everyday. Here new pdfs are added daily in X.

So, how can I do step 2 above, without giving full list of all the pdfs, i tried doing file*.pdf in step2; but it combined all pdfs in random order.

Next problem is, total number of file*.pdf is not same everyday, sometimes 5 pdfs sometimes 10...but nice thing is it is named in order file1.pdf file2.pdf ...

So, I need some help to complete above step 2, such that all pdfs are combined in order and I dont have to give name of each pdf explicitly

Thanks.

UPDATE: This solved the problem

pdftk `ls -rt kanti*.pdf` cat output Kanti.pdf

I did ls -rt as file1.pdf was downloaded first, and then file2.pdf and so on...just doing ls -t put file20.pdf in the start and file1.pdf in last...

+1  A: 

I have used pdftk before for such concatenations as pdftk happens to be readily available to Debian / Ubuntu.

Dirk Eddelbuettel
I want to combine pdfs by order; using *.pdf combines it in random order...i want to combine in this order file1.pdf...file9.pdf file10.pdf file11.pdf and so on...
seg.server.fault
+1  A: 

You could do something like:

GSCOMMAND="gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=Combined_date +%F.pdf -dBATCH"
FILES=`ls file*.pdf | sort -n -k 1.5`

$GSCOMMAND $FILES

This is assuming the files are named "file.pdf". See also the post by alberge.

It will do strange things to files with spaces in their name, so you'll need to add escaping if you need to be able to handle names with spaces.

I'm really curious what other people will come up with, as this seems to me quite a quick and dirty solution, but getting better thanks to the answers of other people:)

EDIT

Used the numerical sort command for FILES as suggested by alberge.

extraneon
+2  A: 

I've also used pdftk in the past with good results.

For listing the files in numeric order, you can instruct sort to ignore the first $n - 1 characters of the filename by doing this:

ls | sort -n -k 1.$n

So if you had file*.pdf:

$ ls | sort -n -k 1.5
file1.pdf
file2.pdf
file3.pdf
file4.pdf
file10.pdf
file11.pdf
file20.pdf
file21.pdf
alberge