views:

562

answers:

5

How to screen scrape HTTPS using C#?

+4  A: 

You can use System.Net.WebClient to start an HTTPS connection, and pull down the page to scrape with that.

Aequitarum Custos
And if you need to log in to get the https content?
Oded
You'll need to make sure you assign a CookieContainer in the WebClient for cookies to be passed across multiple requests (eg. the login page and then the content page).
Danny Tuppeny
The site is using URL rewriting.How do i get the complete url?
Jignesh
If you're talking about server side URL rewriting, no idea. But if you're talking about javascript, simply parse it in code.
Aequitarum Custos
+3  A: 

Look into the Html Agility Pack.

RichardOD
+1  A: 

If for some reason you're having trouble with accessing the page as a web-client or you want to make it seem like the request is from a browser, you could use the web-browser control in an app, load the page in it and use the source of the loaded content from the web-browser control.

Cyril Gupta
+4  A: 

You can use System.Net.WebClient to grab web pages. Here is an example: http://www.codersource.net/csharp_screen_scraping.html

Arriu
link dead: i think this may be the updated link - http://www.codersource.net/microsoft-net/c-advanced/html-screen-scraping-in-c.aspx
Simon_Weaver
A: 

How can i start an HTTPS connection pull down the secure page by using System.Net.WebClient class.

Ajit