views:

253

answers:

1

I have some PDF's all with two attached files with static names. I would like to use iTextSharp to extract these files to a temp directory so that I can work with them further. I tried following the tutorial here but I ran into problems when the iTextSharp.text.pdf.PdfReader didn't have a getCatalog() method as shown in the bottom example.

Any advice on how I can extract the attachments? Let's just say for ease that the PDF document is at "C:\test.pdf" and the two attachments are stored as "attach1.xml" and "attach2.xml".

A: 

I ended up finding a way to do this - although not exactly programmatically. I included a binary called "pdftk.exe" which is PDF ToolKit, which has command-line options to extract the attachments.

To clarify, I added pdftk.exe, then called it via Process.Start("./pdftk", "contains_attachments.pdf unpack_files output \"C:\\output_directory\""). Note that pdftk will not output to a folder with a trailing backslash. You can find pdftk here: http://www.accesspdf.com/pdftk/

After adding the .exe file to the project, you need to set its properties to "Copy always" or "Copy if newer".

Adam S