If I have HTML that looks like:
<td class="blah">&nbs;<a href="http://.....">????</a>&nbsp;</td>
Could I get the ???? value using xpath? What would it look like?
If I have HTML that looks like:
<td class="blah">&nbs;<a href="http://.....">????</a>&nbsp;</td>
Could I get the ???? value using xpath? What would it look like?
Why would you use an XML parser to parse HTML? I would suggest using a dedicated Java HTML parser, there are many, but I haven't tried any myself.
As for your question, would it work, I suspect it will not work, you will get an error when trying to parse it as HTML right at &nbs;
if not earlier.
To use XPath you usually need XML not HTML, but some parsers (e.g. the one built into PHP) have a relaxed Mode which will parse most HTML, too.
If you want to find all <a>
that are direct children of <td class="blah">
the XPath you need is
//td[@class = 'blah']/a
or
//td[@class = 'blah']/a[@href = 'http://...']
(depending on whether you only want the one url or all urls)
This will give you a Set of Nodes. You'll need to iterate through it and then check for the nodeType
of the firstChild
(supposed to be a text node) and the number of child nodes (supposed to be 1). Then the firstChild
will contain the ????