tags:

views:

1862

answers:

3

I want to check to see if an XML document contains a 'person' element anywhere inside. I can check all the first-generation elements very simply:

NodeList nodeList = root.getChildNodes();
for(int i=0; i<nodeList.getLength(); i++){
  Node childNode = nodeList.item(i);
  if (childNode.getNodeName() == "person") {
     //do something with it
  }
}

And and I can add more loops to go into subelements, but I would have to know how many nested loops to put in to determine how far into the document to drill. I could nest 10 loops, and end up with a person element nested 12 elements deep in a given document. I need to be able to pull out the element not matter how deeply nested it is.

Is there way to harvest elements from an entire document? Like return the text values of all tags as an array or iterate over it?

Something akin to python's elementtree 'findall' method perhaps:

for person in tree.findall('//person'):
   personlist.append(person)
+2  A: 

As mmyers states, you could use recursion for this problem.

doSomethingWithAll(root.getChildNodes());

void doSomethingWithAll(NodeList nodelist)
{
    for(int i=0; i<nodeList.getLength(); i++){
      Node childNode = nodeList.item(i);
      if (childNode.getNodeName() == "person") {
         //do something with it
      }

      NodeList children = childNode.getChildNodes();
      if (children != null)
      {
         doSomethingWithAll(children);
      }
    }
}
+2  A: 

That's what XPath is for. To get all elements named "person", here's the expression:

//person

It can be painful to use the JDK's XPath APIs directly. I prefer the wrappers that I wrote in the Practical XML library: http://practicalxml.sourceforge.net/

And here's a tutorial that I wrote (on JDK XPath in general, but mentions XPathWrapper): http://www.kdgregory.com/index.php?page=xml.xpath

kdgregory
+4  A: 

I see three possiblities (two of which others have answered):

  1. Use recursion.
  2. Use XPath (might be a bit overkill for this problem, but if you have a lot of queries like this it is definitely something to explore). Use kdgregory's help on that; a quick look at the api indicated that it is a bit painful to use directly.
  3. If what you have is in fact a Document (that is if root is a Document), you can use Document.getElementsByTagName
Kathy Van Stone
+1 -- #3 is definitely the simplest approach
kdgregory