Hello,
Are there any tools or tricks how to automatically extract tables from pdfs. Are there any C# libraries that could do that? Or do you maybe know other methods how this could be handled?
Thank you very much
Hello,
Are there any tools or tricks how to automatically extract tables from pdfs. Are there any C# libraries that could do that? Or do you maybe know other methods how this could be handled?
Thank you very much
You can use the iTextSharp library to deal with PDFs : http://sourceforge.net/projects/itextsharp/
I've only used it to generate PDFs programatically, but Im fairly certain you can use it to pull them apart.
There's a tutorial here : http://itextsharp.sourceforge.net/tutorial/index.html
PDF files do not contain table structures - several tools will try and 'guess' them.
i found a interesting site and one master thesis about this topic
Information Extraction - Utilizing Table Patterns
http://ieg.ifs.tuwien.ac.at/projects/pdf2table/
if anybody finds more informations please keep on posting...