tags:

views:

291

answers:

4

I have a savable PDF file that has a bunch of fields that users will fill out. From there I'd like to batch process these files by extracting the user entered fields into a CVS file.

Since I'm a .NET guy, I've taken a look at both PDFBox and iTextSharp. With PDFBox I was able to extract the form's text, but not the fields that a user would enter on the fields. This doesn't seem that trivial using these, although I could be wrong.

Is there a faster way to complete this in any other language? I've heard about a Adobe SDK and will research that next, but I know nothing about it so far. Or does someone know how to accomplish what I'm trying to do with the before mentioned libraries?

UPDATE: No one knows of any open source or free libraries? I'm doing this more of a proof of concept and don't have a few hundy to through at the problem.

A: 

Although I haven't used this particular product from ASPOSE, the ASPOSE.Pdf.Kit component will extract both field names and field data.

From their literature:

"You can also Read all form fields of the PDF documents including their names and values into XML, FDF (Form Data Format) and XFDF files."

Kev
A: 

Back when I looked into this (several years ago) you had to use Acrobat Professional, not the Acrobat Reader, to fill in forms that could later be read back. I've never understood why you couldn't do it with Acrobat Reader.

lumpynose
+1  A: 

Try Apago's PDFspy, http://www.apagoinc.com/pdfspy

Dwight Kelly
+1  A: 

I highly recommend Tall Components PDF products. I have used the TallPDF.NET component for generating PDFs dynamically. The Tall Components PDFKit.Net would probably do everything you need. I have not used this specific product from them but if it is anything like their TallPDF component then it will be excellent. It is pretty expensive though... somewhere around $700 for a license. But they do have an evaluation download for you to try out.

Jeff Widmer