tags:

views:

67

answers:

1

I do not know where the problem is... Help and Thanks!

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 8192

at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:543) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.arrangeCapacity(XMLEntityScanner.java:1619) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipString(XMLEntityScanner.java:1657) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1740) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2930) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) at javax.xml.parsers.SAXParser.parse(SAXParser.java:395) at javax.xml.parsers.SAXParser.parse(SAXParser.java:277) at myPackage.MainClass.main(MainClass.java:39)

In the mainclass, code framework as below:

SAXParserFactory sf = SAXParserFactory.newInstance();   
SAXParser sax = sf.newSAXParser();   
sax.parse("english.xml", new DefaultElementHandler("page"){   
public void processElement(Element element) { 
// process the element
}
});

The XML file is huge 4G, and full of text, I need to parse the file and process the text.

Currently, I did nothing the process part, just wanted to print them out in the console. Then OOB...

A: 

You might want to try printing out the error message that goes along with that stack trace. You can do that by adding a call to System.err.println(e.getMessage()) where e is the exception. The message should give you the index that was trying to be accessed.

If the index is negative then there is most likely an integer overflow. If that's the case, you should file a bug report with Xerces. It's possible that Xerces wasn't designed to handle files that large.

Bryan Kyle
where to catch the exception, i.e. add System.err.println(e.getMessage())?I used public static void main(String[] args) throws Exception {} at the moment.
jason.Z
From the stacktrace, It appears that the invalid index is 8192. I suspect it's a Xerces bug. Even if it were invalid UTF-8 in the file Xerces shouldn't throw like this.
ZoogieZork
jason.Z