PDF document manipulation

views:

169

answers:

+1 Q:

PDF document manipulation

I have several PDFs with the following properties:

Each PDF contains a variable number of "documents" with differing number of pages.

Each page in a "document" has text such as "Page 3 of 26".

I want to be able to automatically identify the first and last page of each "document" within a PDF (Note: this is not the same as the first and last page of a PDF as each PDF may contain several "documents") and extract these into a new PDF for later printing and archival.

I'm not sure what tools I can bring to bear on this problem and what libraries are available to tackle this.

Any recommendations? Preferably free and can be used to create a tool that will run on Windows.

You can try using pdftk to decompress the PDF, parse the data, split it, and then recompress it.

Adam Rosenfield 2009-04-08 15:53:02

check this library out

for a commercial solution you could try this one

Konstantinos 2009-04-08 15:55:15

I managed to come up with a horrible unix hack that will work:

use pdftk to decompress and explode into separate pages
use pdftotext to convert each page into text
write a script to identify the appropriate string in the txt and copy the corresponding pdf into a sub-directory [in progress]
find some tool to recombine [to be investigated, probably pdftk can do]

Should work on my unix platform but not sure if it is acceptable to bring all these tools onto the windows environment.

One potential is to use an email gateway to receive pdfs and return processed pdf which makes it even more ugly.

Anyone with a native win32 solution?

2009-04-08 16:40:02

+1 A:

Java has a nice free pdf library. Check out iText.

From iText's site:

You can use iText to:

Serve PDF to a browser
Generate dynamic documents from XML files or databases
Use PDF's many interactive features
Add bookmarks, page numbers, watermarks, etc.
Split, concatenate, and manipulate PDF pages
Automate filling out of PDF forms
Add digital signatures to a PDF file
And much more...

Since it's Java, there should be no issues running on Windows, or anywhere else for that matter.

Steve K 2009-04-08 16:47:23

ansaurus

tags:

views:

answers:

PDF document manipulation

related questions