views:

53

answers:

3

Is there any reliable method to find out the collection of links which is directed us to detail news page. in other word after visiting the first page of website I just want those links that refer to a news item. any solution ?

A: 

If it is for one certain website, you could always try to fetch the HTML of the website and extract the links to the news articles by using regular expressions. Just find pieces in the HTML that your code can use to identify where the links are.

I did this a couple of times to scrape some info from a website.

But maybe an obvious question, there is no RSS feed available on the website?

Wim Haanstra
A: 

You can do a simple WebRequest and download a page and search through the html for the content that you want to parse.

   WebRequest req = WebRequest.Create
              ("http://www.domain.com/news.html");
    req.Proxy = null;
    using (WebResponse res = req.GetResponse())
    using (Stream s = res.GetResponseStream())
    using (StreamReader sr = new StreamReader(s))
        File.WriteAllText("news.html", sr.ReadToEnd());
    //search through html page for news content.

    System.Diagnostics.Process.Start("news.html");
mbcrump
A: 

This program is meant to write for all kinds of news web sites for any template and i want to intelligently find out which one of the links in the page is news I know how to fetch the HTML of a website and build the DOM based on that but i want to know how can i figure out which one of the links in the site is news (banners, categories, about us page and many other exist but i just want the news links)

Ali