views:

314

answers:

3

I am doing a project, in which i need to login into a site and scrape the webpage contents. i tried the following code:

protected void Page_Load(object sender, EventArgs e)
{
    WebClient webClient = new WebClient();
    string strUrl = "http://www.mail.yahoo.com?username=sakthivel123&password=operator&login=1";
    byte[] reqHTML;
    reqHTML = webClient.DownloadData(strUrl);
    UTF8Encoding objUTF8 = new UTF8Encoding();
    Label1.Text = objUTF8.GetString(reqHTML1);
}

This scrapes the login page of the mail . But i need to scrape my inbox details. Please instruct me on how to proceed further, thanks in advance.

A: 

See this question - Writing a C# program that scans ecommerce website and extracts products pictures + prices + description from them

P.S.: It's called "scrape" and the act of performing a screen scrape would be called (You guessed it!) "Screen scraping". The word "scrap" when used as a verb means to discard - Such as "the project has been scrapped!" ;-)

Cerebrus
+1  A: 

Please see this questions and the related questions. We have to study the HTML source of a webpage before we can scrap it properly. So login manually and get the source of the inbox page and then study it to scrape it.

Why dont you use yahoo's webmail API? Which is a better solution.

Shoban
i need to login and scrap the webpage . i have the code for how to scrap the page . But i need to login automatically and scrap the webpage contents
Sakthivel
A: 

I'd suggest you first use a tool called Fiddler to analize the communication between the target site and your browser. You can look at all the http headers, cookies, content,etc.

Once your webClient object is able to replicate the actions of a browser, including logging in, setting the appropriate cookies, etc, you can automate the procedure.

And finally, once you have the desired HTML, use regular expressions to extract the information you want from it.

nandos