views:

40

answers:

3

Hello,

Are there any tools or tricks how to automatically extract tables from pdfs. Are there any C# libraries that could do that? Or do you maybe know other methods how this could be handled?

Thank you very much

A: 

You can use the iTextSharp library to deal with PDFs : http://sourceforge.net/projects/itextsharp/

I've only used it to generate PDFs programatically, but Im fairly certain you can use it to pull them apart.

There's a tutorial here : http://itextsharp.sourceforge.net/tutorial/index.html

ThePaddedCell
Please don't recommend products unless you know whether or not they can actually do what you're recommending them for. It just adds noise.
Rowan
Is it not better to suggest something, then suggesting nothing at all?
ThePaddedCell
+1  A: 

PDF files do not contain table structures - several tools will try and 'guess' them.

mark stephens
+2  A: 

i found a interesting site and one master thesis about this topic

Information Extraction - Utilizing Table Patterns

http://ieg.ifs.tuwien.ac.at/projects/pdf2table/

if anybody finds more informations please keep on posting...

nWorx