views:

1298

answers:

3

Hey guys,

I'm trying to programmatically load a web page via the WebBrowser control with the intent of testing the page & it's JavaScript functions. Basically, I want to compare the HTML & JavaScript run through this control against a known output to ascertain whether there is a problem.

However, I'm having trouble simply creating and navigating the WebBrowser control. The code below is intended to load the HtmlDocument into the WebBrowser.Document property:

WebBrowser wb = new WebBrowser();
wb.AllowNavigation = true;

wb.Navigate("http://www.google.com/");

When examining the web browser's state via Intellisense after Navigate() runs, the WebBrowser.ReadyState is 'Uninitialized', WebBrowser.Document = null, and it overall appears completely unaffected by my call.

On a contextual note, I'm running this control outside of a Windows form object: I do not need to load a window or actually look at the page. Requirements dictate the need to simply execute the page's JavaScript and examine the resultant HTML.

Any suggestions are greatly appreciated, thanks!

A: 

You should handle the WebBrowser.DocumentComplete event, once that event is raised you will have the Document etc.

wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);


private void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
  WebBrowser wb = sender as WebBrowser;
  // wb.Document is not null at this point
}

Here is a complete example, that I quickly did in a Windows Forms application and tested.

public partial class Form1 : Form
  {
    public Form1()
    {      
      InitializeComponent();
    }

    private void Form1_Load(object sender, EventArgs e)
    {
      WebBrowser wb = new WebBrowser();
      wb.AllowNavigation = true;

      wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);

      wb.Navigate("http://www.google.com");

              }

    private void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
      WebBrowser wb = sender as WebBrowser;
      // wb.Document is not null at this point
    }
  }

Edit: Here is a simple version of code that runs a window from a console application. You can of course go further and expose the events to the console code etc.

using System;
using System.Windows;
using System.Windows.Forms;

namespace ConsoleApplication1
{
  class Program
  {    
    [STAThread] 
    static void Main(string[] args)
    {      
      Application.Run(new BrowserWindow());   

      Console.ReadKey();
    }
  }

  class BrowserWindow : Form
  {
    public BrowserWindow()
    {
      ShowInTaskbar = false;
      WindowState = FormWindowState.Minimized;
      Load += new EventHandler(Window_Load);
    }

    void Window_Load(object sender, EventArgs e)
    {      
      WebBrowser wb = new WebBrowser();
      wb.AllowNavigation = true;
      wb.DocumentCompleted += wb_DocumentCompleted;
      wb.Navigate("http://www.bing.com");      
    }

    void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
      Console.WriteLine("We have Bing");
    }
  }
}
Chris Taylor
This unfortunately does not work. The web browser never seems to even as so much as attempt loading the page. Adding in this code to my original snippet results in the DocumentCompleted event never being called and the program exiting.
Dave
@Dave, that is strange. I attached a complete piece of code that I quickly tested to confirm it works.
Chris Taylor
@Chris, I think the significant difference is I'm not creating or using the WebBrowser within a Form or class. Instead, I'm creating/using the WebBrowser from a library-like context (no UI) without intending to ever visually display the output. The goal was to simply get the HTML produced from requesting that page, after it has been processed by the client-side JavaScript embedded in that page.
Dave
@Dave, the problem there is that console applications do not have a message pump to process messages.
Chris Taylor
@Dave, I added a simplistic sample of how you could handle this from a console app.
Chris Taylor
@Chris, Thanks. I'll give it a shot and let you know if I have any luck.
Dave
@Chris, The console version of the code works much better for my purposes, and was able to get the HTML in the event handler. However, it didn't appear to run the JavaScript embedded in the page as I had hoped. Nonetheless, this solution circumvented one of the challenges, many thanks!
Dave
A: 

You probably need to host the control in a parent window. You can do this without breaking requirements by simply not showing the window that hosts the browser control by moving it off screen. It might also be useful for development to "see" that it does actually load something for testing, verification etc.

So try:

// in a form's Load handler:

WebBrowser wb = new WebBrowser();
this.Controls.Add(wb);
wb.AllowNavigation = true;
wb.Navigate("http://www.google.com/");

Also check to see what other properties are set on the WebBrowser object when you instantiate it via the IDE. E.g. create a Form, drop a browser control onto it and then check the form's designer file to see what code is generated. You might be missing some key property that needs to be set. I've discovered many-an-omission in my code in this way and also learned how to properly instantiate visual objects programmatically.

P.S. If you do use a host window, it should only be visible during development. You would hide in some manner for production.

Another approach:

You could go "raw" by tryiing something like this:

 System.Net.WebClient wc = new System.Net.WebClient();

  System.IO.StreamReader webReader = new System.IO.StreamReader(
         wc.OpenRead("http://your_website.com"));

  string webPageData = webReader.ReadToEnd();

...then RegEx or parse webPageData for what you need. Or do you need the jscript in the page to actually execute? (Which should be possible with .NET 4.0)

Paul Sasik
It's important to note that I'm also running this application under a [STAThread] (the entry of this program has this directive). Otherwise, an exception regarding ActiveX and STAThreads only supported is thrown.In the same function where I created the WebBrowser, I also tried creating a form and placing the WebBrowser control in that form. Like:WebBrowser wb = new WebBrowser();Form f = new Form();f.Controls.Add(wb);Leaving it at that then trying to Navigate does nothing. Showing the window then trying to Navigate does nothing.
Dave
The intended loads for this application dictates a need to avoid Forms and Windows: given that it needs to run 1000+ tests a day using this WebBrowser (if its practical), it'd be impractical to have someone sit and watch each page load.
Dave
@Paul, In response to your "Another approach" section, a raw format would be ok, however I -do- need the jscript on the page to execute. There is a particular interest in whether the jscript embedded in the pages being tested by this application appropriately modify the document.
Dave
A: 

The Webbrowser control is just a wrapper around Internet Explorer.

You can set in onto an invisible Windows Forms window to completely instantiate it.

Foxfire