Does anyone know of a PDF file parser that I could use to pull out sections of text from the plaintext pdf file? Specifially I want a way to be able to reliably pull out the section of text specific to annotations?
Delphi, C# RegEx I dont mind.
Does anyone know of a PDF file parser that I could use to pull out sections of text from the plaintext pdf file? Specifially I want a way to be able to reliably pull out the section of text specific to annotations?
Delphi, C# RegEx I dont mind.
Not sure if it supports the functionality you need, but we've been using abcPDF with some success.
The PDF File Parser article on xactpro seems to be exactly what you need. It explains the format of the PDF and comes with full source code for a parser (and another project for visualisation of the model).
The parser uses format-specific terms, but you could easily use the visualiser to learn what to look for.
You can also take a look at Xpdf (http://www.foolabs.com/xpdf/download.html)