views:

260

answers:

2

I want to extract information from a web page.

The page has m nodes, which can be found by .evaluate("//div[@class='news']", document, ....).

For each of the above nodes, there are 3 nodes inside them. Each of them has different @class selector. And I want to extract these m 3-tuple records.

I tried to use .evaluate() function as instructed in

https://developer.mozilla.org/en/Introduction_to_using_XPath_in_JavaScript

by using this code

parentNodes = document.evaluate("//div[@class='news']", document, ....).
while (true){
   var node = parentNodes.iterateNext();
   var child = document.evaluate("//div[@class='title']", node, ....). 
   ...
}

However, "child" is always assigned to the first node in the document, instead of the first node within "node".

I ran this in firebug console.

Does any one know what's wrong?

+2  A: 

you are calling evaluate on the document. Hence the XPath expression is being evaluated from the root of the XML tree. Also, if you want XPath to select a node from withing the current context e.g. among the children of the current node you should use the ".//" context selector

Anatoly Fayngelerin
+1  A: 

If you start an XPath expression with "/" then you are starting down from the root node/document node of the context node. So instead of "//div[@class = 'title']" use "descendant::div[@class = 'title']", that way you are selecting the descendant div elements of the context node.

Martin Honnen
Both methods work like a charm. Thank you guys so much !!!
manova
BTW `descendant::div[@class = 'title']` is equivalent to `.//div[@class = 'title']`.
Tomalak