Hi, I need to extract the text from a PDF file. This text will likely be in a table format, and it is going to be used for automatic transfer of data between an external party and our systems.
Can anyone suggest a command line tool (eg pdf to txt) or a library that would be good for this?
Language options:
- C# (preferred)
- Java (if I must)
I found some ideas here, but i think the guy was talking more about a one-off situation, i'm talking more like a daily import:
http://stackoverflow.com/questions/488089/extracting-tables-from-pdf-files