views:

435

answers:

6

Is there any way to convert PDF to excel? I'm looking for a free component but cant find one.

Thanks in advance.

+1  A: 

I hope someone else has a more direct solution, but if not:

I had some data I need to get out of PDF and into a more palatable format and found PDF Miner did a pretty good job. It is a python script that consumes PDF and spits out XML.

Once I had the XML which described the layout and the content, I could write another script to get the data in to CSV.

Cannonade
A: 

OpenOffice has a PDF import extension which you can use to convert to ODF format.

http://wiki.services.openoffice.org/wiki/Pdf_Import_Extension

Then, you can port it over to Excel however you like (i.e. open it in OpenOffice Calc and save as Excel)

verhogen
+1  A: 

Assuming the component you refer will be part of an application that converts PDF files to Excel, such component may(?) not exists. As implied by the replies, you should find a way to

  1. Read the content of a pdf file through a pdf reader library, extracting the needed data from the pdf file,
  2. Create an instance of Excel file, populating its cells with the data extracted from PDF

I suggest you should look for a PDF reader library to extract data from the pdf file. If you do, then half of your problem is solved.

OnesimusUnbound
+4  A: 

I've worked with PDF quite a bit over the years and in my experience this kind of conversion is pointless unless you know exactly how the PDF document is structured. PDF is a layout format; the data in it is optimized for displaying and printing content not for data extraction.

Sure you could extract text from a PDF using some library, but you'll have a hard time figuring out the actual order of the text. To figure that out, you'd have to render the document and determine the position of each piece of text. As far as I know, only commercial PDF libs support this kind of extraction.

Marnix van Valen
A: 

Collecting the data in a spreadsheet

You can use Acrobat Professional to consolidate the information from the returned files into a spreadsheet, such as Microsoft Excel.

  1. Start Acrobat Professional and open the form you saved in the previous section.

  2. Choose File > Form Data > Create Spreadsheet From Data Files.

  3. Click Add Files and locate the XML file that you emailed.

  4. Repeat step 3 if you want to add more files to the list.

  5. Click Export.

  6. Select a location on your computer to save the spreadsheet, and then click Save.

The Create Spreadsheet dialog box displays Done! when Acrobat has created the spreadsheet. 7. Click View File Now to open the spreadsheet file in your default application.

Here is a sample of how the data in the spreadsheet looks: You can see the names that you entered in the Name box of the Binding tab. This makes the spreadsheet more readable. 8. Close the spreadsheet.

  1. Click Close Dialog.

  2. Exit Acrobat Professional.

A: 

Maybe you like this one too...it is not free but matches your needs:

You can use "CZ-Pdf2Txt", it can extract text from pdf files, and it can convert pdf files to delimit table text, or csv files.

This tool can preserve original document layout, and supports command line interface, so you can call it from your application.

Specially the folder watch function can watch source file path and convert new uploading documents automatically, you can get demo version and more information from http://www.convertzone.com/pdf2txt/help.htm

regards
flyaga

flyaga