tags:

views:

479

answers:

3

I just updated to the newest version of jtidy which came out in october and it seems to have broken my document object for unknown reasons. This is my code:

tidy = new Tidy();
tidy.setShowWarnings(false);
tidy.setShowErrors(0);
tidy.setQuiet(true);
tidy.setMakeClean(true);

URL url = new URL(url_string);
Document doc = tidy.parseDOM(url.openStream(), null);

String xpath_string = "//table[@id='links']//a";
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile(xpath_string);
NodeList n = (NodeList)expr.evaluate(doc, XPathConstants.NODESET);

And this is the error I am getting:

javax.xml.transform.TransformerException: -1
    at com.sun.org.apache.xpath.internal.XPath.execute(Unknown Source)
    at com.sun.org.apache.xpath.internal.XPath.execute(Unknown Source)
    at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(Unknown Source)
    at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(Unknown Source)
    at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(Unknown Source)
    at IndoorClimbing.main(IndoorClimbing.java:55)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
    at com.sun.org.apache.xml.internal.dtm.ref.ExpandedNameTable.getType(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase.indexNode(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.dom2dtm.DOM2DTM.addNode(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.dom2dtm.DOM2DTM.nextNode(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase._firstch(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase.getFirstChild(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBaseTraversers$ChildTraverser.first(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.AxesWalker.getNextNode(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.AxesWalker.nextNode(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.WalkingIterator.nextNode(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.NodeSequence.nextNode(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.NodeSequence.runTo(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(Unknown Source)
    ... 6 more
---------
java.lang.ArrayIndexOutOfBoundsException: -1
    at com.sun.org.apache.xml.internal.dtm.ref.ExpandedNameTable.getType(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase.indexNode(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.dom2dtm.DOM2DTM.addNode(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.dom2dtm.DOM2DTM.nextNode(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase._firstch(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase.getFirstChild(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBaseTraversers$ChildTraverser.first(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.AxesWalker.getNextNode(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.AxesWalker.nextNode(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.WalkingIterator.nextNode(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.NodeSequence.nextNode(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.NodeSequence.runTo(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(Unknown Source)
    at com.sun.org.apache.xpath.internal.XPath.execute(Unknown Source)
    at com.sun.org.apache.xpath.internal.XPath.execute(Unknown Source)
    at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(Unknown Source)
    at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(Unknown Source)
    at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(Unknown Source)
    at IndoorClimbing.main(IndoorClimbing.java:55)
--------------- linked to ------------------
javax.xml.xpath.XPathExpressionException
    at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(Unknown Source)
    at IndoorClimbing.main(IndoorClimbing.java:55)
Caused by: javax.xml.transform.TransformerException: -1
    at com.sun.org.apache.xpath.internal.XPath.execute(Unknown Source)
    at com.sun.org.apache.xpath.internal.XPath.execute(Unknown Source)
    at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(Unknown Source)
    at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(Unknown Source)
    ... 2 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
    at com.sun.org.apache.xml.internal.dtm.ref.ExpandedNameTable.getType(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase.indexNode(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.dom2dtm.DOM2DTM.addNode(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.dom2dtm.DOM2DTM.nextNode(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase._firstch(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBase.getFirstChild(Unknown Source)
    at com.sun.org.apache.xml.internal.dtm.ref.DTMDefaultBaseTraversers$ChildTraverser.first(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.AxesWalker.getNextNode(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.AxesWalker.nextNode(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.WalkingIterator.nextNode(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.NodeSequence.nextNode(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.NodeSequence.runTo(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(Unknown Source)
    at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(Unknown Source)
    ... 6 more

The error occurs at the last line of code when trying to generate the NodeList. Has anyone had issues like this with the new version of JTidy?

A: 

Since the error happens in com.sun.org.apache, I don't think it's an JTidy issue.

Try to strip down your example so you can file a bug report against the XalanJ project.

Aaron Digulla
+3  A: 

Had a similar problem. Found a rather silly workaround (to re-parse the jtidy output) that suggests a problem with jTidy.

document = tidy.parseDOM(rstream, null); 

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Source xmlSource = new DOMSource(document);
Result outputTarget = new StreamResult(outputStream);
TransformerFactory.newInstance().newTransformer().transform(xmlSource, outputTarget);
InputStream is = new ByteArrayInputStream(outputStream.toByteArray());

Document doc = db.parse(is);

It took me hours; hope this helpes.

myFriendJoe
DANG. This just bit me too. Thanks for the hard work myFriendJoe!
Electrons_Ahoy
A: 

@myFriendJoe: Thank you! I had the same problem and it works now.

SEI