views:

2643

answers:

11

I'm using .NET WebBrowser control. How do I know when a web page is fully loaded?

I want to know when the browser is not fetching any more data. (The moment when IE writes 'Done' in its status bar...).

Notes:

  • The DocumentComplete/NavigateComplete events might occur multiple times for a web site containing multiple frames.
  • The browser ready state doesn't solve the problem either.
  • I have tried checking the number of frames in the frame collection and then count the number of times I get DocumentComplete event but this doesn't work either.
  • this.WebBrowser.IsBusy doesn't work either. It is always 'false' when checking it in the Document Complete handler.
A: 

Hi,

Have you tried WebBrowser.IsBusy property?

Best Regards

@nand

Anand
yep. The web browser claims not to be busy each time the document complete handler is called...
Yuval Peled
A: 

How about using javascript in each frame to set a flag when the frame is complete, and then have C# look at the flags?

mbeckish
I don't want to manipulate the DOM tree of every site that the browser is navigating to. But suppose I do use your solution, how do I do it in javascript?
Yuval Peled
I don't see the advantage of doing this in JS vs C#.
jeffamaphone
A: 

I don't have an alternative for you, but I wonder if the IsBusy property being true during the Document Complete handler is because the handler is still running and therefore the WebBrowser control is technically still 'busy'.

The simplest solution would be to have a loop that executes every 100 ms or so until the IsBusy flag is reset (with a max execution time in case of errors). That of course assumes that IsBusy will not be set to false at any point during page loading.

If the Document Complete handler executes on another thread, you could use a lock to send your main thread to sleep and wake it up from the Document Complete thread. Then check the IsBusy flag, re-locking the main thread is its still true.

roryf
But the IsBusy is set to false too early. For example, if you have six frames in a web page, when the first frame completes loading, the IsBusy is false on DocumentComplete event.
Yuval Peled
Each frame gets its own webbrowser (IWebBrowser2 implementation). Likely the IsBusy attribute only applies to the specific frame. And when it's complete, its no longer busy.
jeffamaphone
A: 

I'm not sure it'll work but try to add a JavaScript "onload" event on your frameset like that :

function everythingIsLoaded() { alert("everything is loaded"); }
var frameset = document.getElementById("idOfYourFrameset");
if (frameset.addEventListener)
    frameset.addEventListener('load',everythingIsLoaded,false); 
else
    frameset.attachEvent('onload',everythingIsLoaded);
paulgreg
I want to be able to know if all frames are loaded for any web site so I don't know which frames it contains.
Yuval Peled
You should do that on the frameset (parent of all frames), not on each frame. It's pretty easy to get it from any web site like that : document.getElementsByTagName('frameset')[0]
paulgreg
+1  A: 

Can you use jQuery? Then you could easily bind frame ready events on the target frames. See this answer for directions. This blog post also has a discussion about it. Finally there is a plug-in that you could use.

The idea is that you count the number of frames in the web page using:

$("iframe").size()

and then you count how many times the iframe ready event has been fired.

kgiannakakis
A: 

You will get a BeforeNavigate and DocumentComplete event for the outer web page, as well as each frame. You know you're done when you get the DocumentComplete event for the outer webpage. You should be able to use the managed equivilent of IWebBrowser2::TopLevelContainer() to determine this.

Beware, however, the website itself can trigger more frame navigations anytime it wants, so you never know if a page is truly done forever. The best you can do is keep a count of all the BeforeNavigates you see and decrement the count when you get a DocumentComplete.

Edit: Here's the managed docs: TopLevelContainer.

jeffamaphone
I tried counting the before navigates and the document complete in the WebBrowser control. It is not synced... :(. There are more before navigate than document complete. [Maybe it has to do with caching or duplicate frames that are fetched. I don't know].
Yuval Peled
Regarding the document complete event: in C# WebBrowser you don't get the document object that just completed loading. Just the url. So you can't get to its browser container.
Yuval Peled
A: 

Here's what finally worked for me:

       public bool WebPageLoaded
    {
        get
        {
            if (this.WebBrowser.ReadyState != System.Windows.Forms.WebBrowserReadyState.Complete)
                return false;

            if (this.HtmlDomDocument == null)
                return false;

            // iterate over all the Html elements. Find all frame elements and check their ready state
            foreach (IHTMLDOMNode node in this.HtmlDomDocument.all)
            {
                IHTMLFrameBase2 frame = node as IHTMLFrameBase2;
                if (frame != null)
                {
                    if (!frame.readyState.Equals("complete", StringComparison.OrdinalIgnoreCase))
                        return false;

                }
            }

            Debug.Print(this.Name + " - I think it's loaded");
            return true;
        }
    }

On each document complete event I run over all the html element and check all frames available (I know it can be optimized). For each frame I check its ready state. It's pretty reliable but just like jeffamaphone said I have already seen sites that triggered some internal refreshes. But the above code satisfies my needs.

Edit: every frame can contain frames within it so I think this code should be updated to recursively check the state of every frame.

Yuval Peled
+1  A: 

My approach to doing something when page is completely loaded (including frames) is something like this:

using System.Windows.Forms;
    protected delegate void Procedure();
    private void executeAfterLoadingComplete(Procedure doNext) {
        WebBrowserDocumentCompletedEventHandler handler = null;
        handler = delegate(object o, WebBrowserDocumentCompletedEventArgs e)
        {
            ie.DocumentCompleted -= handler;
            Timer timer = new Timer();
            EventHandler checker = delegate(object o1, EventArgs e1)
            {
                if (WebBrowserReadyState.Complete == ie.ReadyState)
                {
                    timer.Dispose();
                    doNext();
                }
            };
            timer.Tick += checker;
            timer.Interval = 200;
            timer.Start();
        };
        ie.DocumentCompleted += handler;
    }

From my other approaches I learned some "don't"-s:

  • don't try to bend the spoon ... ;-)
  • don't try to build elaborate construct using DocumentComplete, Frames, HtmlWindow.Load events. Your solution will be fragile if working at all.
  • don't use System.Timers.Timer instead of Windows.Forms.Timer, strange errors will begin to occur in strange places if you do, due to timer running on different thread that the rest of your app.
  • don't use just Timer without DocumentComplete because it may fire before your page even begins to load and will execute your code prematurely.
Kamil Szot
A: 

Here's how I solved the problem in my application:

private void wbPost_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    if (e.Url != wbPost.Url)
        return;
    /* Document now loaded */
}
Daniel Stutzbach
A: 

I just use the webBrowser.StatusText method. When it says "Done" everything is loaded! Or am I missing something?

Jeppoo
+1  A: 

Here's my tested version. Just make this your DocumentCompleted Event Handler and place the code that you only want be called once into the method OnWebpageReallyLoaded(). Effectively, this approach determines when the page has been stable for 200ms and then does its thing. If you have any further questions contact me at my software engineering firm

// event handler for when a document (or frame) has completed its download
Timer m_pageHasntChangedTimer = null;
private void webBrowser_DocumentCompleted( object sender, WebBrowserDocumentCompletedEventArgs e ) {
    // dynamic pages will often be loaded in parts e.g. multiple frames
    // need to check the page has remained static for a while before safely saying it is 'loaded'
    // use a timer to do this

    // destroy the old timer if it exists
    if ( m_pageHasntChangedTimer != null ) {
        m_pageHasntChangedTimer.Dispose();
    }

    // create a new timer which calls the 'OnWebpageReallyLoaded' method after 200ms
    // if additional frame or content is downloads in the meantime, this timer will be destroyed
    // and the process repeated
    m_pageHasntChangedTimer = new Timer();
    EventHandler checker = delegate( object o1, EventArgs e1 ) {
        // only if the page has been stable for 200ms already
        // check the official browser state flag, (euphemistically called) 'Ready'
        // and call our 'OnWebpageReallyLoaded' method
        if ( WebBrowserReadyState.Complete == webBrowser.ReadyState ) {
            m_pageHasntChangedTimer.Dispose();
            OnWebpageReallyLoaded();
        }
    };
    m_pageHasntChangedTimer.Tick += checker;
    m_pageHasntChangedTimer.Interval = 200;
    m_pageHasntChangedTimer.Start();
}

OnWebpageReallyLoaded() {
    /* place your harvester code here */
}
Daniel Collicott