I see many questions and answers about using C# to generate PDF files.
I have a related, but different task.
I have a large number of PDF files already created, and I would like to validate certain parts of the content with Regular Expressions (RegExs). I want to open the PDFs in C#, and be able to read out the text in something approaching a linear fashion.
If headers, footers, any sidebars, etc, get skipped or read out of order, it doesn't matter. I'm just after as much of the main-body text as I can retrieve.
Can you point me towards tools, libraries, API's, etc, that will enable me to programmatically read text in PDF files?