views:

30

answers:

2

I'm trying to use dom4j to parse an xhtml document. If I simply print out the document I can see the entire document so I know it is being loaded correctly. The two divs that I'm trying to select are at the exact same level in the document.

html
  body
    div
     table
      tbody
       tr
        td
         table
           tbody
            tr
             td
              div class="definition"
              div class="example"

My code is

List<Element> list = document.selectNodes("//html/body/div/table/tbody/tr/td/table/tbody/tr/td");

but the list is empty when i do System.out.println(list);

If i only do List<Element> list = document.selectNodes("//html"); it does actually return a list with one element in it. So I'm confused about whats wrong with my xpath and why it won't find those divs

+1  A: 

Try declaring the xhtml namespace to the xpath, e.g. bind it to the prefix x and use //x:html/x:body... as XPath expression (see also this article which is however for Groovy, not for plain Java). Probably something like the following should do it in Java:

DefaultXPath xpath = new DefaultXPath("//x:html/x:body/...");
Map<String,String> namespaces = new TreeMap<String,String>();
namespaces.put("x","http://www.w3.org/1999/xhtml");
xpath.setNamespaceURIs(namespaces);

list = xpath.selectNodes(document);

(untested)

Andre Holzner
This worked perfectly! I didn't realize you could do that. I also had an extra div that i needed in the path. But I tried it again without the x: and it did not work that way so your solution did it. I figured that parsing xhtml had issues vs normal xml.
controlfreak123
+1  A: 

What about just "//div"? Or "//html/body/div/table/tbody"? I've found long literal XPath expressions hard to debug, as it's easy for my eyes to get tricked... so I break them down until it DOES work and then build back up again.

Rodney Gitzel
Thats what I was trying to do. Thats how I caught the missing div. But unfortunately i still needed Andre's answer in order to make the path work even after i had the elements in the right order
controlfreak123
Ah, yes... I'd missed the 'xhtml' part, so if you have a namespace in the file, you'd definitely need it.
Rodney Gitzel