views:

85

answers:

2

Getting DocumentSteam for html websites without using a Winforms WebBrowser control for parsing?

Is this possible? I would like to create some types like:

HtmlDocument doc = new HtmlDocument ("http://www.ms.com");
DocumentStream ds = doc.GetFullStream();

...

Also if possible, please post code.

A: 

Have you tried using the HTML Agility Pack for html parsing, instead of an ui element?

Simon Svensson
Thanks, does it return the html stream of the url I will pass to it?
Joan Venge
I do not know what type of stream you're after. HTML Agility Pack will provide you with a parsed DOM-structure for html, just like System.Xml does for xml. This is a excellent tool for parsing html, but if you just want to read the html content of a page as a string, use the WebClient class as previously mentioned.
Simon Svensson
+2  A: 

You may also use WebClient:

    String url = "http://www.ms.com";
    WebClient client = new WebClient ();

    // Add a user agent header in case the 
    // requested URI contains a query.

    client.Headers.Add ("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");

    Stream data = client.OpenRead (url);
    //Do stuff here
    //StreamReader reader = new StreamReader (data);
    //string s = reader.ReadToEnd ();
    //Console.WriteLine (s);
    data.Close ();
    reader.Close ();
jerjer
Thanks, that's very good code, it works. Do you know if it's possible ot retirive the page as if I am using IE? It seems like this doesn't use IE cookies, which makes sense, but the link I am visiting is a forum and it would be great if I can visit it as if I am logged in. Is that possible?
Joan Venge
You would need to create your own HttpRequest and setup the associated cookie container for that.
Simon Svensson
Yeah, Simon is right you need to use HttpRequest to setup cookies.
jerjer