Xquery to extract text | ansaurus

tags:

xquery

views:

14

answers:

1

Q:

Xquery to extract text

I am working on extracting text out of html documents and storing in database. I am using webharvest tool for extracting the content. However I kind of stuck at a point. Inside webharvest I use XQuery expression inorder to extract the data. The html document that I am parsing is as follows:

 <td><a name="hw">HELLOWORLD</a>Hello world</td>

I need to extract "Hello world" text from the above html script.

I have tried extracting the text in this fashion:

  $hw :=data($item//a[@name='hw']/text())

However what I always get is "HELLOWORLD" instead of "Hello world".

Is there a way to extract "Hello World". Please help.

What if I want to do it this way:

     <td>
       <a name="hw1">HELLOWORLD1</a>Hello world1
       <a name="hw2">HELLOWORLD2</a>Hello world2
       <a name="hw3">HELLOWORLD3</a>Hello world3
     </td>

I would like to extract the text Hello world 2 that is in betweeb hw2 and hw3. I would not want to use text()[3] but is there some way I could extract the text out between /a[@name='hw2'] and /a[@name='hw3'].

A:

First of all, you are looking for the a nodes whose name attributes start with 'hw'. This can be achieved with the following path:

$item//a[starts-with(@name,'hw')]

Once you have found your a nodes you want to retrieve the first text node that follows the a node. This can be done as so:

$item//a[starts-with(@name,'hw')]/following-sibling::text()[1]

Oliver Hallam 2010-06-23 13:09:00

Thank you so much problemo solved

Technocrat 2010-06-23 13:37:07

related questions

Unit Testing XQuery

XQuery multiple xml files?

Filter Rows returned from SQL Query by XML type field using Xquery

XSLT: Efficient mapping of two large lists

How do I find duplicate data in xml document using XQuery?

Zorba versus XQilla for XQuery

XQuery to get list of attributes

SQL Server 2000, "FOR XML AUTO" Query via http, Need "Content Length = 12345" in return XML Header

Does XQuery (or XPath) have equivalents to Update, Insert, and Delete as well as Select?

DOM like API for working with XQuery in Java

Using XQUERY/Modify to replace a piece of my XML with value from varchar field

Using XQuery in Linq To SQL?

java date format incompatible with xquery xs:date format, how to fix?

In SQL Server can I insert multiple nodes into XML from a table?

looking for Java GUI components / ideas for syntax highlighting

How do you convert a string to a node in XQuery?

Is there an XML XQuery interface to existing XML files?

Using SQL Server 2005's XQuery select all nodes with a specific attribute value, or with that attribute missing.

How do you switch on a string in XQuery?

how XQUERY is actually used?

Which is the best book to learn and understand XQuery?

How can i search a non case sensitive word with Sql Server XQuery?

Best way to search data stored as XML in Sql Server?

Querying XML columns in SQLServer 2005

insert/replace XML tag in XmlDocument