views:

626

answers:

6

I am trying to use C# to access the content of a webpage. For example, I want to grab the text of the body of google homepage.

I know this is doable in C# with its web browser control. But I couldn't find a good, simple example of doing it. All the resources I found online involve creating Forms and GUI, which I don't need, I just need a good old Console Application.

If anyone can provide a simple console-based code snippet that accomplishes the above, it'll be greatly appreciated.

+5  A: 

Actually the WebBrowser is a GUI control used in case you want to visualize a web page (embed and manage Internet Explorer in your windows application). If you just need to get the contents of a web page you could use the WebClient class:

class Program
{
    static void Main(string[] args)
    {
        using (var client = new WebClient())
        {
            var contents = client.DownloadString("http://www.google.com");
            Console.WriteLine(contents);
        }
    }
}
Darin Dimitrov
This won't work if the website is dynamically generated in javascript (ie, if the html source is just .js file), right?
Saobi
+1 Nicely done.
Andrew Hare
@Saobi, you are correct, javascript will not be executed with this technique. You will only get the plain text representation of the web page.
Darin Dimitrov
I basically want to send in a query to a site and grab the returned results, but the site is all written in javascript, so parsing HTML source code like in google won't help. How can I:1) Send in the query without knowing what the request URL is2) Parse the contents of an javascript generated page?I have to simulate keystrokes and send it in?
Saobi
Javascript or not, I still think this is the right way to do it. If that means you need to reason about how the javascript works so you can transform it yourself, then so be it.
Joel Coehoorn
+1  A: 

If you just want the content and not an actual browser, you can use an HttpWebRequest.

Here's a code sample: http://www.c-sharpcorner.com/Forums/ShowMessages.aspx?ThreadID=58261

mgroves
A: 

The HTML Agility Pack might be what you need. It provides access to HTML pages via DOM and XPath.

Zr40
A: 

You can do something like this:

Uri u = new Uri( @"http://launcher.worldofwarcraft.com/alert" );
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(u);
HttpWebResponse res = (HttpWebResponse)req.GetResponse();
System.IO.Stream st = res.GetResponseStream();
System.IO.StreamReader sr = new System.IO.StreamReader(st);
string body = sr.ReadToEnd();
System.Console.WriteLine( "{0}", body );

the above code shows the maintenance message for WoW USA (if any message has been posted)

Anders K.
A: 

You can also use the WatiN library to load and manipulate web pages easily. This was designed as a testing library for web UI's. To use it get the latest from the official site http://watin.sourceforge.net/ . For C# the following code in a console application will give you the HTML of the Google home page (this is modified from the getting started example on the WatiN site). The library also contains many more useful methods for getting and setting various parts of the page, taking actions and checking for results.

   using System;
    using WatiN.Core;

    namespace Test
    {
      class WatiNConsoleExample
      {
        [STAThread]
        static void Main(string[] args)
        {
          // Open an new Internet Explorer Window and
          // goto the google website.
          IE ie = new IE("http://www.google.com");

          // Write out the HTML text of the body
          Console.WriteLine(ie.Text);


          // Close Internet Explorer and the console window immediately.
          ie.Close();

          Console.Readkey();
        }
      }
    }
Joe Kuemerle
A: 

Google screen scraping and as mentioned above use the HttpWebRequest. When you do whatever it is you're doing, I'd recommend using Fiddler to help you figure out what's really going on.

nickyt