views:

2407

answers:

5

Hi,

My company receives data from an external company via Excel. We export this into SQL Server to run reports on the data. They are now changing to PDF format, is there a way to reliably port the data from the PDF and insert it into our SQL Server 2008 database?

Would this require writing an app or is there an automated way of doing this?

A: 

I think you will have to write an application for this. This question talks about extracting data from PDF. After this you can export the data to excel format so that you can preserve the existing import format.

Shoban
A: 

Look for information on "Scraping" the data from the PDF. I believe Adobe has some tools that allow you to do this for simple text but I've not used them.

Honestly though, I would try to do anything you can to get this data in a raw format from your vendor.

Brian
+2  A: 

As already mentioned - you will have to write an app to do this, but ideally you would be able to get the raw data from the external company rather than having to process the PDF.

Howver, if you do want to extract the data from the PDF, I've used iText and found it to be very powerful, reliable and most importantly - free. It comes in Java and .Net flavours - iTextSharp is the .Net version. It allows you to programatically manipulate PDF documents and it will expose the contents of the PDF to the application that you write.

A: 

It all depends on how they've included the data within the PDF. Generally speaking, there's two possible scenarios here:

  1. The data is just a text object within a PDF. You'll need to use a tool to extract the text from the PDF then insert it into your database.

  2. The data is contained within form fields in a PDF. You'll need to use a tool to extract data from the form fields and insert it into your database.

Hopefully scenario #2 applies to you because this is precisely what PDF forms are designed for. Scenario #1 is really just a hack that you'd only use if you didn't have any other options. Extracting plain text from a PDF isn't as easy or accurate as you might expect.

If you're receiving a PDF form then all you need to do is match up the right fields in the PDF form with the corresponding fields in your database and then suck in the data. This process could be entirely automated if you wrote your own application.

Would this require writing an app or is there an automated way of doing this?

Yes, both of these options would require writing an app or buying an app. If you write your own app then you'll need to find a third-party PDF library that supports retrieving data from form fields or extracting text from a PDF.

Rowan
A: 

By using automation anywhere you can extract data from anywhere and transfer into any database that you have. Go through the details that they have on this data extraction page. Probably you can understand whether its good for you. :)

Bob