views:

79

answers:

2

Hi all!

I have a search bar in my web site that searches for all the pages in the web site that contain a particular keyword. This is achieved by querying an Indexing Server catalog.

My question is as follows, suppose I search for the word "ASP.NET" and i got say 3 pages that contain an occurence of "ASP.NET".

I want to display the line in which the keyword "ASP.NET" is found (so that user gets a contextual information).

Can anyone help me please??? It's really very urgent. Thanks in advance!

A: 

Using System.Xml.Linq Read the page into an XDocument. Use linq to query the XDocument for the text, then return the XElement and a further interrogate this element.

Mike
Hi Mike! Thx for the reply. but sorry i forgot to precise that i am using .net 2.0 and i have no LINQ support! Is there any way out without using LINQ?
Suraj
This would work great, if the web pages are valid XHTML. Otherwise, trying to read them as XML will cause an exception.
driis
If you can't use LINQ, then XmlDocument will help you.
Braveyard
Do you have any example as how to do this plz??? using the XmlDocument
Suraj
A: 

Try parsing the document, find the occurence(s) of the search term, and then extracting the surrounding text. This can be done by taking all the text inside the same tag, or take all text in the same sentence. You could do that with a regular expression.

Which works best depends on your needs and the structure of the content. You could also include surrounding sentences in order to achieve a minimum length of the extracted text.

Here is an example, trying to extract sentences that contain the word "question" in this question. It is by no means perfect, but it illustrates the concept and should get you started:

using System;
using System.Net;
using System.Text.RegularExpressions;
class Program
{
    private const string url =
        "http://stackoverflow.com/questions/1655313/get-the-static-text-contents-of-a-web-page";
    private const string keyword = "question";

    private const string regexTemplate = ">([^<>]*?{0}[^<>]*?)<";
    static void Main(string[] args)
    {
        WebClient client = new WebClient();
        string html = client.DownloadString(url);
        Regex regex = new Regex(string.Format(regexTemplate,keyword) , RegexOptions.IgnoreCase);
        var matches = regex.Matches(html);
        foreach (Match match in matches)
            Console.WriteLine(match.Groups[1].Value);
    }
}
driis