views:

138

answers:

3

Folks, I'm tryning to extract data from web page using C#.. for the moment I used the Stream from the WebReponse and I parsed it as a big string. It's long and painfull. Someone know better way to extract data from webpage? I say WINHTTP but isn't for c#..

+2  A: 

To download data from a web page it is easier to use WebClient:

string data;
using (var client = new WebClient())
{
    data = client.DownloadString("http://www.google.com");
}

For parsing downloaded data, provided that it is HTML, you could use the excellent Html Agility Pack library.

And here's a complete example extracting all the links from a given page:

class Program
{
    static void Main(string[] args)
    {
        using (var client = new WebClient())
        {
            string data = client.DownloadString("http://www.google.com");
            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(data);

            var nodes = doc.DocumentNode.SelectNodes("//a[@href]");
            foreach(HtmlNode link in nodes)
            {
                HtmlAttribute att = link.Attributes["href"];
                Console.WriteLine(att.Value);
            }
        }
    }
}
Darin Dimitrov
A: 

If the webpage is valid XHTML, you can read it into an XPathDocument and xpath your way quickly and easily straight to the data you want. If it's not valid XHTML, I'm sure there are some HTML parsers out there you can use.

Found a similar question with an answer that should help. http://stackoverflow.com/questions/100358/looking-for-c-html-parser

Matthew
A: 

There is a better way to extract data from web which is using a software that scrapes data from web. Its easier, faster and better! now its up to you which one you would prefer out of hundreds available out there. I have used webextractor, mozenda n automation anywhere. Mozenda is good but very slow. Automation anywhere is better. A little work on GUI is required but working is great! You can see the details on this web data extraction page.

Bob