views:

578

answers:

5

What solutions exist for screen scraping a site over SSL for use with .NET?

My use case is that I need to login to a partner website (https), navigate through a dynamic hierarchy, and download a zipped file of reports.

I certainly could use other screen scrapers if there are no good viable options in .NET, either though the framework or OSS.

+3  A: 

The gold standard for screen scraping in .NET is the HTML Agility Pack.

As far as retrieving pages over HTTPS, try this article:

(As mentioned by other answers, you may actually be after automation rather than screen scraping, in which case you may be better off with WatiN, a framework orginally designed for automated web testing, but plenty flexible enough for what you want)

Colin Pickard
+3  A: 

Perhaps consider WATIN to simulate navigating or WebClient if you can find the items yourself and simulate the logic.

Jeff Moser
WatiN worked great. I would have shot myself if I had to parse out all of the HTML elements manually.
Even Mien
+2  A: 

You can certainly do this with HttpWebRequest, but keeping track of the cookies used for logging in may be non-trivial. I would recommend using watir (ruby) or watin (c#). Both will handle all of that for you.

From the WatiN website, here is an example:

public void SearchForWatiNOnGoogle()
{
 using (IE ie = new IE("http://www.google.com"))
 {
  ie.TextField(Find.ByName("q")).TypeText("WatiN");
  ie.Button(Find.ByName("btnG")).Click();

  Assert.IsTrue(ie.ContainsText("WatiN"));
 }
}
consultutah
+1  A: 

I've heard of people hosting the browser in their program, and scraping with jQuery. Seems great to me since jQuery is great for searching the DOM.

Lance Fisher
A: 

I've tried Watin it works properly for a console application but i want to scrap a secure page of web site(https) through an asp.net application. Is it possible via watin.

Please advice..

Ajit