What is the easiest way to get the text (words) or a PDF doc as a one long String or array of Strings.
I have tried pdfbox but that is not working for me.
What is the easiest way to get the text (words) or a PDF doc as a one long String or array of Strings.
I have tried pdfbox but that is not working for me.
JPedal and Multivalent also offer text extraction in Java or you could access xpdf using Runtime.exec
PDFBox barfs on many newer PDFs, especially those with embedded PNG images.
I was very impressed with PDFTextStream
use iText. The following snippet for example will extract the text.
PdfTextExtractor parser =new PdfTextExtractor(new PdfReader("C:/Text.pdf")); parser.getTextFromPage(3);