Hi,
do you know a Java library, with which I can extract the text of a PDF document as a string, and which also preserves all empty lines and empty spaces from the original document (as they appear in the pdf document)?
I am using right now the PDFTextStripper class from the PDFBox-0.7.3 library, and I use the getText() method, which does return the document as a string, however, it removes also all empty lines, tabs and any empty spaces between the text. The new lines are preserved, so I can recognize the structure of the document, however, it is important for me to keep the other empty stuff as well. This is the default behaviour of getText(), and it seems that it is not possible to make it work so that it preserve the empty pieces of the text (I could not find any method in the API for this purpose).
Thank you for your help.