ansaurus

Question

how to use XML DOM API to go to every non-text nodes?

Answer 1

A:

What do you know about the node you need to find? If you know exactly that it's:

A page element
It has a pagenumber attribute with value 500

then XPath is the way forward (assuming it's available on your platform - you haven't specified beyond "DOM"; most DOM implementations include XPath as far as I've seen).

In this case you'd use an XPath of:

//page[@pagenumber='500']

If you can't use XPath, please explain which DOM API you're using and we can try to come up with the best solution. Basically you'll probably end up iterating over every element node, checking whether its name is page and then checking whether it has an appropriate pagenumber attribute value.

Jon Skeet 2009-04-06 08:09:10

well I guess I don't know what's the node and its attribute yet in this case. So what can I do than?

Jonathan 2009-04-06 08:10:30

So what *do* you know? What differentiates "the node that you want" from one that you don't?

Jon Skeet 2009-04-06 08:11:43

I don't know yet this thing should suppose to work on all XML documents that have non-text node. So I guess I have no pre-assumption on what node will coming next. As long as its a non-text node in side one XML documents I want to find it

Jonathan 2009-04-06 08:14:35

@Jon: The property that differentiates the page node is that it contains no textual child elements. Please see my answer. I do however, agree that knowledge of the DOM API being used here is possibly important for an accurate answer.

Cerebrus 2009-04-06 08:41:46

Ah. There's a huge difference between "node which doesn't contain any text nodes" and "non-text nodes". An element is a non-text node, even if it *contains* text. Deleting this answer as it's pointless now...

Jon Skeet 2009-04-06 09:16:05

Answer 2

A:

Looks like you'll be needing an XPath. The W3 Schools site has a good reference, but, assuming the node always appears under a node, the XPath /bookstore/book/page will return a node set with each node in it. /bookstore/book/page[@pagenumber='500'] will get each node where the pagenumber attribute has a value of 500.

The // syntax will find the node anywhere in the document without worrying about structure - this can be easier but is slower, especially with large documents. If you have a document with a known structure, it's best to use the explicit XPath.

Graham Clark 2009-04-06 08:13:00

thanks but I don't know what's the node is going to be like. I guess that's why I need to use DOM

Jonathan 2009-04-06 08:20:17

Answer 3

+1 A:

Your question basically seems to be : Given an XML document, How do I find child nodes that do not have any text-content.

A simple XPath expression such as:

/bookstore/book/*[count(child::text()) = 0]

or

/bookstore/book/*[not(text())]

will do it for you. Applying this XPath expression on the sample document will return a node-set containing both the page elements. You do not have to know the name of the page element beforehand, or even the names of all possible child elements of the book element, as you can see.

To explain: You need to query for child-nodes of the book element that do not contain ANY textual child nodes. The child::* axis represents all child nodes of the current node and the text() node-type restricts the processed node types to those that contain textual content.

Edit: Note that if you want to query for non-text nodes in any XML document (as per your latest edit to the question), you should choose the answer provided by nils_gate. My answer was given prior to your edit and illustrates the concept, rather than providing a generic solution.

Cerebrus 2009-04-06 08:35:05

This seems to do the job. However, I can't figure out why would you have '/bookstore/book' in your XPath expression. FJ states the bookstore XML is an example XML doc from W3C and that he wants to do this for ANY xml. So wouldn't the solution be something like //*[count(child::text()) = 0] ?

Peter Perháč 2009-04-06 08:44:10

Good point, @Master. Apparently, FJ made that edit to his post *after* I had posted my answer.

Cerebrus 2009-04-06 08:46:35

Answer 4

+2 A:

XPATH ="//*[not(text())]"
Will select all nodes which are non-text node.
Here in the given example: bookstore and book are also non-text nodes as they does not have any text of their own, though their children do have text.

nils_gate 2009-04-06 08:43:26

Good one. That's another (possibly more straightforward) way of writing the XPath.

Cerebrus 2009-04-06 08:49:00

ansaurus

tags:

views:

answers:

how to use XML DOM API to go to every non-text nodes?

related questions