tags:

views:

705

answers:

4

Hello I have the following xml file:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<config>
 <a>
  <b>
   <param>p1</param> 
   <param>p2</param> 
  </b>
 </a>
</config>

and the xpath code to get my node params:

Document doc = ...;
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("/config/a/b");
Object o = expr.evaluate(doc, XPathConstants.NODESET);
NodeList list = (NodeList) o;

but it turns out that the nodes list (list) has 5 children, including "\t\n", instead of just two. Is there something wrong with my code? How can I just get my two nodes?

Thank you!

+2  A: 

so the xpath looks like: /config/a/b/*/text(). And the output for :

for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i).getNodeValue());
    }

would be as expected: p1 and p2

Flueras Bogdan
+2  A: 

How about

/config/a/b/*/text()/..

?

Ted Dziuba
+1  A: 

When you select /config/a/b/, you are selecting all children of b, which includes three text nodes and two elements. That is, given your XML above and only showing the fragment in question:

<b>
 <param>p1</param> 
 <param>p2</param> 
</b>

the first child is the text (whitespace) following <b> and preceding <param>p1 .... The second child is the first param element. The third child is the text (whitespace) between the two param elements. And so on. The whitespace isn't ignored in XML, although many forms of processing XML ignore it.

You have a couple choices:

  1. Change your xpath expression so it will only select element nodes, as suggested by Ted Dziuba, or
  2. Loop over the five nodes returned and only select the non-text nodes.

You could do something like this:

for (int i = 0; i < nodes.getLength(); i++) {
    if (nodes.item(i).getNodeType() != Node.TEXT_NODE) {
        System.out.println(nodes.item(i).getNodeValue());
    }
}

You can use the node type to select only element nodes, or to remove text nodes.

Eddie
A: 

I am not sure but shouldn't /config/a/b just return b? /config/a/b/param should return the two param nodes...

Could the view on the problem be the problem? Of course you get back the resulting node AND all its children. So you just have to look at the first element and not at its children.

But I can be totally wrong, because I am usually just use Xpath to navigate on DOM trees (HtmlUnit).

ReneS