views:

484

answers:

2

I'm trying to parse a well formed xhtml document.
I'm having problems during the nodes iteration.
My xHtml has a structure like

<?xml version="1.0" encoding="UTF-8"?>
<html>
  <head>...</head>
  <body>
   ...
    <form>
    ...
      <div class="AB">    (1 or 2 times)
      ...      
        <div class="CD">  
        ...
          <table>          
             <tbody>
             <tr>    (1 to N times)
                <td> XXX </td>
                       <td> YYY </td> ...

The information I need is contained in the columns (td).
I want to construct N objects. So every row (tr) contains in its columns the info I need to construct an object.
I've 1 or 2 div of class="AB". So basically I'll have 1 or 2 objects AB containing a list of other objects created from every row in the table

So at first I extract a NodeList of these AB divs

NodeList ABlist= (NodeList) xpath.evaluate("//div[@class='AB']", document, XPathConstants.NODESET)

Now I'm trying to get a NodeList of all the tr elems of the first div AB.

NodeList trList = (NodeList) xpath.evaluate("/div/table//tr", ABlist.item(0), XPathConstants.NODESET);

In this case the trList is empty. Do you know what's wrong with my code?
Thank you

A: 

Are you sure this is XHTML? There's no namespace declared in your sample document, and without that namespace, it's not XHTML. If there is a namespace, and you missed that out of your sample for brevity, then your XPath expressions need to reference the namespace also, otherwise they won't select anything.

skaffman
Hi skaffman, I'm correctly retreiving the ABlist of divs. It's only the way I try to extract the trList that is not working.Actually you're right, the document doesn't specify any namespace so maybe it can be only called xml. It only conforms the xml spec without specifing any namespace.
al nik
+2  A: 

The problem in your second failing XPath is that you start it with a /:

/div/table//tr

In XPath, just as in file paths, starting a path with a / means "start from the root of the document". But you don't actually want to do that there - you want to start from your node. So:

div/table//tr

will do what you want.

Pavel Minaev
You're right Pavel! I thought that (as 2nd parameter) I was passing the 'context' to the evaluate() method. I think I tried without / before posting here but maybe I changed also something else in the meantime and that didn't work at the time. Anyway it's working now. Thanks a lot for your help!
al nik
You _are_ passing the context there. The problem is that by using leading `/` in the query you're telling it to start the path not from the context node, but from the _root_ of the document to which the node belongs.
Pavel Minaev