I'm doing some screen scraping using WATIJ, but it can't read HTML tables (throws NullPointerExceptions or UnknownObjectExceptions). To overcome this I read the HTML and run it through JTidy to get well-formed XML.
I want to parse it with XPath, but it can't find a <table ...>
by id
even though the table is there in the XML plain as day. Here is my code:
XPathFactory factory=XPathFactory.newInstance();
XPath xPath=factory.newXPath();
InputSource inputSource = new InputSource(new StringReader(tidyHtml));
XPathExpression xPathExpression=xPath.compile("//table[@id='searchResult']");
String expression = "//table[@id='searchResult']";
String table = xPath.evaluate(expression, inputSource);
System.out.println("table = " + table);
The table is an empty String.
The table is in the XML, however. If I print the tidyHtml
String it shows
<table
class="ApptableDisplayTag"
id="searchResult"
style="WIDTH: 99%">
I haven't used XPath before so maybe I'm missing something.
Can anyone set me straight? Thanks.