ansaurus

Question

Access the Contents of a Web Page with C#

Answer 1

+5 A:

Actually the WebBrowser is a GUI control used in case you want to visualize a web page (embed and manage Internet Explorer in your windows application). If you just need to get the contents of a web page you could use the WebClient class:

class Program
{
    static void Main(string[] args)
    {
        using (var client = new WebClient())
        {
            var contents = client.DownloadString("http://www.google.com");
            Console.WriteLine(contents);
        }
    }
}

Darin Dimitrov 2009-07-14 14:26:07

This won't work if the website is dynamically generated in javascript (ie, if the html source is just .js file), right?

Saobi 2009-07-14 14:27:24

+1 Nicely done.

Andrew Hare 2009-07-14 14:27:34

@Saobi, you are correct, javascript will not be executed with this technique. You will only get the plain text representation of the web page.

Darin Dimitrov 2009-07-14 14:29:00

I basically want to send in a query to a site and grab the returned results, but the site is all written in javascript, so parsing HTML source code like in google won't help. How can I:1) Send in the query without knowing what the request URL is2) Parse the contents of an javascript generated page?I have to simulate keystrokes and send it in?

Saobi 2009-07-14 14:32:44

Javascript or not, I still think this is the right way to do it. If that means you need to reason about how the javascript works so you can transform it yourself, then so be it.

Joel Coehoorn 2009-07-14 14:35:57

Answer 2

+1 A:

If you just want the content and not an actual browser, you can use an HttpWebRequest.

Here's a code sample: http://www.c-sharpcorner.com/Forums/ShowMessages.aspx?ThreadID=58261

mgroves 2009-07-14 14:26:51

Answer 3

A:

The HTML Agility Pack might be what you need. It provides access to HTML pages via DOM and XPath.

Zr40 2009-07-14 14:26:53

Answer 4

A:

You can do something like this:

Uri u = new Uri( @"http://launcher.worldofwarcraft.com/alert" );
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(u);
HttpWebResponse res = (HttpWebResponse)req.GetResponse();
System.IO.Stream st = res.GetResponseStream();
System.IO.StreamReader sr = new System.IO.StreamReader(st);
string body = sr.ReadToEnd();
System.Console.WriteLine( "{0}", body );

the above code shows the maintenance message for WoW USA (if any message has been posted)

Anders K. 2009-07-14 15:02:50

Answer 5

A:

You can also use the WatiN library to load and manipulate web pages easily. This was designed as a testing library for web UI's. To use it get the latest from the official site http://watin.sourceforge.net/ . For C# the following code in a console application will give you the HTML of the Google home page (this is modified from the getting started example on the WatiN site). The library also contains many more useful methods for getting and setting various parts of the page, taking actions and checking for results.

   using System;
    using WatiN.Core;

    namespace Test
    {
      class WatiNConsoleExample
      {
        [STAThread]
        static void Main(string[] args)
        {
          // Open an new Internet Explorer Window and
          // goto the google website.
          IE ie = new IE("http://www.google.com");

          // Write out the HTML text of the body
          Console.WriteLine(ie.Text);


          // Close Internet Explorer and the console window immediately.
          ie.Close();

          Console.Readkey();
        }
      }
    }

Joe Kuemerle 2009-07-14 16:30:31

Answer 6

A:

Google screen scraping and as mentioned above use the HttpWebRequest. When you do whatever it is you're doing, I'd recommend using Fiddler to help you figure out what's really going on.

nickyt 2009-09-23 16:50:26

ansaurus

tags:

views:

answers:

Access the Contents of a Web Page with C#

related questions