While trying to parse html using Yahoo Query Language and xpath functionality provided by YQL, I ran into problems of not being able to extract “text()” or attribute values.
For e.g.
perma link
select * from html where url="http://stackoverflow.com"
and xpath='//div/h3/a'
gives a list of anchors as xml
<results>
<a class="question-hyperlink" href="/questions/661184/filling-the-text-area-with-the-text-when-a-button-is-clicked" title="In ASP.net, I need the code to fill the text area (in the form) when a button is clicked. Can you help me through by showing a simple .aspx code containing the script tag? ">Filling the text area with the text when a button is clicked</a>...
</results>
Now when I try to extract the node value using
select * from html where url="http://stackoverflow.com"
and xpath='//div/h3/a/text()'
I get results concatenated rather than a node list e.g.
<results>Xcode: attaching to a remote process for debuggingWhy is b
…… </results>
How do I separate it into node lists and how do I select attribute values ?
A query like this
select * from html where url="http://stackoverflow.com"
and xpath='//div/h3/a[@href]'
gave me the same results for querying div/h3/a