views:

354

answers:

2

Hello,

I've created a pdf with a table using itextsharp. I found an example at http://itextsharp.sourceforge.net/tutorial/ch05.html. Now I'd like to read data from the table again using itextsharp. I can't find any documentation on how to read this data. Can someone give me an example?

+2  A: 

Unfortunately you can't do this in iTextSharp. The section entitled "Advanced: reading PDF" on the iTextSharp page at SourceForge says:

The pdf format is just a canvas where text and graphics are placed without any structure information. As such there aren't any 'iText-objects' in a PDF file. In each page there will probably be a number of 'Strings', but you can't reconstruct a phrase or a paragraph using these strings. There are probably a number of lines drawn, but you can't retrieve a Table-object based on these lines. In short: parsing the content of a PDF-file is NOT POSSIBLE with iText. Post your question on the newsgroup news://comp.text.pdf and maybe you will get some answers from people that have built tools that can parse PDF and extract some of its contents, but don't expect tools that will perform a bullet-proof conversion to structured text.

Jay Riggs
A: 

I also need to read the data from the PDF. What I ended up doing was converting the PDF to text and then string parse the result to get to the data.

In my scenario I wanted to take the data in the tables and convert them to Excel.

Marthinus