I have several PDFs with the following properties:
Each PDF contains a variable number of "documents" with differing number of pages.
Each page in a "document" has text such as "Page 3 of 26".
I want to be able to automatically identify the first and last page of each "document" within a PDF (Note: this is not the same as the first and last page of a PDF as each PDF may contain several "documents") and extract these into a new PDF for later printing and archival.
I'm not sure what tools I can bring to bear on this problem and what libraries are available to tackle this.
Any recommendations? Preferably free and can be used to create a tool that will run on Windows.