Hello,
I need to extract text from pdf files using itext. The problem is that some pdf files contain 2 columns, when I extract text, I obtain as result a text file where columns are merged (in the same line, we found 2 columns)
this is the code:
public class pdf {
private static String INPUTFILE = "http://www.revuemedecinetropicale.com/TAP_519-522_-_AO_07151GT_Rasoamananjara__ao.pdf" ;
private static String OUTPUTFILE = "c:/new3.pdf";
public static void main(String[] args) throws DocumentException,
IOException {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document,
new FileOutputStream(OUTPUTFILE));
document.open();
PdfReader reader = new PdfReader(INPUTFILE);
int n = reader.getNumberOfPages();
PdfImportedPage page;
// Go through all pages
for (int i = 1; i <= n; i++) {
page = writer.getImportedPage(reader, i);
Image instance = Image.getInstance(page);
document.add(instance);
}
document.close();
PdfReader readerN = new PdfReader(OUTPUTFILE);
for (int i = 1; i <= n; i++) {
String myLine = PdfTextExtractor.getTextFromPage(readerN,i);
System.out.println(myLine);
try {
FileWriter fw = new FileWriter("c:/yo.txt",true);
fw.write(myLine);
fw.close();
}catch (IOException ioe) {ioe.printStackTrace(); }
}
Could you please answer me how to do this task and extract each column of the pdf file