views:

1573

answers:

4

I need some information from a website that's not mine, in order to get this information I need to login to the website to gather the information, this happens through a HTML form. How can I do this authenticated screenscaping in C#?

Extra information:

  • Cookie based authentication.
  • POST action needed.
+4  A: 

You'd make the request as though you'd just filled out the form. Assuming it's POST for example, you make a POST request with the correct data. Now if you can't login directly to the same page you want to scrape, you will have to track whatever cookies are set after your login request, and include them in your scraping request to allow you to stay logged in.

It might look like:

HttpWebRequest http = WebRequest.Create(url) as HttpWebRequest;
http.Connection = "Keep-alive"; //uncertain
http.Method = "POST";
http.ContentType = "application/x-www-form-urlencoded";
string postData="FormNameForUserId=" + strUserId + "&FormNameForPassword=" + strPassword;
byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(postData);
http.ContentLength = dataBytes.Length;
using (Stream postStream = http.GetRequestStream())
{
    postStream.Write(dataBytes, 0, dataBytes.Length);
}
HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
// Probably want to inspect the http.Headers here first
http = WebRequest.Create(url2) as HttpWebRequest;
http.CookieContainer = new CookieContainer();
http.CookieContainer.Add(httpResponse.Cookies);
HttpWebResponse httpResponse2 = http.GetResponse() as HttpWebResponse;

Maybe.

dlamblin
Thank you, this looks like something I could use, I'll accept this answer if it works when I get back to programming. :-)
TomWij
A: 

You need to use the HTTPWebRequest and do a POST. This link should help you get started. The key is, you need to look at the HTML Form of the page you're trying to post from to see all the parameters the form needs in order to submit the post.

http://www.netomatix.com/httppostdata.aspx

http://geekswithblogs.net/rakker/archive/2006/04/21/76044.aspx

BFree
Yup, the hardest part I forgot to mention (because it's used most of the times) is to keep the cookie with me for the next page.
TomWij
+1  A: 

You can use a WebBrowser control. Just feed it the URL of the site, then use the DOM to set the username and password into the right fields, and eventually send a click to the submit button. This way you don't care about anything but the two input fields and the submit button. No cookie handling, no raw HTML parsing, no HTTP sniffing - all that is done by the browser control.

If you go that way, a few more suggestions:

  1. You can prevent the control from loading add-ins such as Flash - could save you some time.
  2. Once you login, you can obtain whatever information you need from the DOM - no need to parse raw HTML.
  3. If you want to make the tool even more portable in case the site changes in the future, you can replace your explicit DOM manipulation with an injection of JavaScript. The JS can be obtained from an external resource, and once called it can do the fields population and the submit.
eran
The problem is that I can't create a GUI form in this part of the application.
TomWij
Well, that's too bad. if you get tired from doing the low-level stuff, you can try writing a separate GUI app that will be spawned from your app, do the scraping and communicate back the results. But that's kind of a stretch...
eran
A: 

BFree:

Can you explain a little futher please, how I can find out exactly what form paramets I need to set. I am trying hotmail and flickr but no luck. I am thinking I am not setting the right parameters. Here is some code...

PostSubmitter post = new PostSubmitter(); post.Url = url; post.PostItems.Add("email", username); post.PostItems.Add("password", password);
post.Type = PostSubmitter.PostTypeEnum.Post; string result = post.Post(); Response.Write(result);

I am using the "PostSubmitter" class from the second link you posted.

Thanks

--tolga

Tolga
Look at the source code to determine the form parameters. If you learn about HTML forms and the way PHP or ASP.NET would handle the submitted data them you should understand how to determine them. Dlamblin's solution worked fine to me, no clue if PostSubmitter works.
TomWij