views:

48

answers:

0

Possible Duplicate:
Extract each column of a pdf file

I need to extract text from pdf files using itext. The problem is that some pdf files contain 2 columns, when I extract text, I obtain as result a text file where columns are merged (in the same line, we found 2 columns)

this is the code:

public class pdf {

private static String INPUTFILE = "http://www.revuemedecinetropicale.com/TAP_519-522_-_AO_07151GT_Rasoamananjara__ao.pdf" ;

private static String OUTPUTFILE = "c:/new3.pdf";

public static void main(String[] args) throws DocumentException,

        IOException {

    Document document = new Document();

    PdfWriter writer = PdfWriter.getInstance(document,

            new FileOutputStream(OUTPUTFILE));

                document.open();

    PdfReader reader = new PdfReader(INPUTFILE);

    int n = reader.getNumberOfPages();

    PdfImportedPage page;

    // Go through all pages

    for (int i = 1; i <= n; i++) {

            page = writer.getImportedPage(reader, i);

            Image instance = Image.getInstance(page);

            document.add(instance);

    }

    document.close();


    PdfReader readerN = new PdfReader(OUTPUTFILE);

    for (int i = 1; i <= n; i++) {

    String myLine = PdfTextExtractor.getTextFromPage(readerN,i);

    System.out.println(myLine);


    try {             

     FileWriter fw = new FileWriter("c:/yo.txt",true);

    fw.write(myLine);

            fw.close();

    }catch (IOException ioe) {ioe.printStackTrace(); }

    }


Could you please answer me how to do this task and extract each column of the pdf file