views:

325

answers:

10

I want to catch and ignore and ArrayIndexOutOfBoundsException error (basically it's not something I have control over, so I need my program to keep chugging along).

However my try/catch pair doesn't seem to catch the exception and ignore it. Hopefully you can pick out what I am doing wrong.

The exception occurs at this line

content = extractor.getTextFromPage(page);

Here is my code:

for(int page=1;page<=noPages;page++){
    try{
        System.out.println(page);     
        content = extractor.getTextFromPage(page);
        }
    }   
    catch (ArrayIndexOutOfBoundsException e){
    System.out.println("This page  can't be read");
    }    
}

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Invalid index: 02 at com.lowagie.text.pdf.CMapAwareDocumentFont.decodeSingleCID(Unknown Source) at com.lowagie.text.pdf.CMapAwareDocumentFont.decode(Unknown Source) at com.lowagie.text.pdf.parser.PdfContentStreamProcessor.decode(Unknown Source) at com.lowagie.text.pdf.parser.PdfContentStreamProcessor.displayPdfString(Unknown Source) at com.lowagie.text.pdf.parser.PdfContentStreamProcessor$ShowText.invoke(Unknown Source) at com.lowagie.text.pdf.parser.PdfContentStreamProcessor.invokeOperator(Unknown Source) at com.lowagie.text.pdf.parser.PdfContentStreamProcessor.processContent(Unknown Source) at com.lowagie.text.pdf.parser.PdfTextExtractor.getTextFromPage(Unknown Source) at com.pdfextractor.main.Extractor.main(Extractor.java:64)

edit: I have put the try/catch within the for loop
and added the stack trace
and removed index=1

A: 

you need the try/catch to be inside the forloop. control pops out to the try catch, the catch fires, and resumes control afterwards, but the forloop has already been terminated.

kolosy
Thanks, but I am still getting the problem. Initially I had it in the for loop.
Ankur
+1  A: 

Instead of using this exception, you should fix your code so that you do not go past array boundaries!

Most arrays count from 0 up to array.length-1

If you replace your for loop with this, you might this avoids the entire issue:

for (int page = 0;page < noPages;page++){
Matt
The code causing the problem is not my own and the page numbers correspond to real page numbers so they cannot be arbitrarily changed - in short I am not causing the exception but I need to handle/ignore eit
Ankur
But do you know what page number causes the exception? 99% of the time you should be using less than (<) instead <= when looping through an array. Did you give that a try?
Matt
Yes tried it, the page number causing the problem is page=31 and there a total of 39 pages in the document. So in short the error is not to do witih this loop it is coming out of the code behind the getTextFromPage() method - since I only get it on some documents and at different places in each document it must have something to do with how that method works - it's part of the iText package.
Ankur
Yea, I see your stack trace you just added, and clearly it is coming from the PDF API you're using. Not sure exactly why it isn't being caught. You have an extra brace in your code but that should actually be causing a compile error. I guess, if desperate, you can try catching Throwable to see if you can trap *something*
Matt
Thanks Matt, will try that.
Ankur
A: 
    for(int page=1;page<=noPages;page++)
    {
        try
        {
            content = extractor.getTextFromPage(page); 
            System.out.println(content);
        }
        catch (ArrayIndexOutOfBoundsException e)
        {
            System.out.println("This page can't be read");
        }
    }
Ambrosia
page=0 causes another error - there is no page=0 the first page is page=1. The page variable corresponds to real page numbers.
Ankur
is there a count method you can call on the extractor object? Where do you get noPages from?
Ambrosia
Yes there is other code that get's that value int noPages = reader.getNumberOfPages(); that code works fine
Ankur
+3  A: 

Stupid question, but is the ArrayIndexOutOfBoundsException that you put in the catch from the same package as the one being thrown? i.e. java.lang

Or perhaps catch throwable to see if that even works.

digiarnie
That's actually a really good question, it wasn't the case, but then I imported "import java.lang.ArrayIndexOutOfBoundsException;" and I still get the error. I also have "import java.io.IOException;" I am wondering if that can cause a conflict?
Ankur
shouldn't. what happened when you try-catch (Throwable t)?
digiarnie
Sorry was in a meeting - just tried throwable and it seems to be the best solution.
Ankur
+3  A: 

It is possible that the code that you are calling is handling the ArrayIndexOutOfBoundsException and printing the the stack trace on its own without rethrowing it. If that is the case, you would not see your System.out.println called.

EDIT: If you want to keep chugging along, it would be good to know that the PDFContentStreamProcessor#processContent will catch the ArrayIndexOutOfBoundsException and then throw an instance of its com.lowagie.text.ExceptionConverter, which is a subclass of RuntimeException.

akf
Thanks I will have to do a little reading to get my head around throwing and catching but I suspect you are right. Will check it out and report back.
Ankur
Was just about to suggest the same thing. Seems like a high possibility (even if it would be a weird thing for the other component to do)
digiarnie
You could also try and (yuck) debug code and step into the other library (assuming you have either the source code or have a de-compiler in your IDE) and see exactly what it is doing.
digiarnie
Thanks akf - my java skills are unfortunately not good enough to know how to implement what you suggest in your EDIT
Ankur
+3  A: 

Maybe this is a no-brainer (after all, I'm running on 3 hours of sleep in the last 36 hours), but along the lines of what digiarnie and Ankur mentioned: have you tried simply catch (Exception e)?

It's definitely not ideal, since obviously it (along with the Throwable t suggestion) will catch every exception under the sun, not limited to ArrayOutOfBoundsException. Just thought idea out there if you haven't tried it yet.

Magsol
Yep that works fine - I have actually used this now.
Ankur
A: 

Perhaps this is a silly question... Are you sure that the exception is thrown in the code you posted and not in a differen method?

TofuBeer
A: 

The program should have worked. You should give more details including your class name. You can try by catching Exception or putting a finally block with some s.o.p in it.

fastcodejava
A: 

This is strange - I actually had a look at itext's source in the method the exception is thrown from (CMapAwareDocumentFont.decodeSingleCID) and it looks like this:

 private String decodeSingleCID(byte[] bytes, int offset, int len){
        if (toUnicodeCmap != null){
            if (offset + len > bytes.length)
                throw new ArrayIndexOutOfBoundsException("Invalid index: " + offset + len);
            return toUnicodeCmap.lookup(bytes, offset, len);
        }

        if (len == 1){
            return new String(cidbyte2uni, 0xff & bytes[offset], 1);
        }

        throw new Error("Multi-byte glyphs not implemented yet");
    }

The ArrayIndexOutOfBoundsException it throws is the standard Java one. I can't see any reason your original try-catch not working.

Perhaps you should post the entire class? Also, which version of itext are you using?

Annie
A: 

Wait a second! You're missing some braces in there :) Your catch statement is outside your for statement! You have this:

for(int page=1;page<=noPages;page++){
    try{
        System.out.println(page);               
        content = extractor.getTextFromPage(page);
        }
    }   
    catch (ArrayIndexOutOfBoundsException e){
    System.out.println("This page  can't be read");
    }    
}

It should be:

for(int page=1;page<=noPages;page++) {
    try{
        System.out.println(page);               
        content = extractor.getTextFromPage(page);
    }
    catch (ArrayIndexOutOfBoundsException e){
        System.out.println("This page  can't be read");
    } 
} //end for loop  

}//This closes your method or whatever is enclosing the for loop
Annie
Actually, this wouldn't compile if you had it like this.
Annie