views:

110

answers:

2

I have a list of urls which i need to parse and dump data from. The pages require ajax and i require the DOM (and not html) to parse correctly. So i am using a webbrowser control. How do i iterator through a list and parse each page? I am writing

    for(int i=0; i<pageList.Count; i++)
    {
        webBrowser1.Navigate(pageList[i]);
        //but i need to wait until the page is done loading
        //wait for the AJAX to finish
        //allow the JS to run
        parsePage();
    }
+1  A: 

The way you have it designed is not going to work well. You should subscribe to the DocumentCompleted event of the web browser to know when the document has loaded completely. But, just calling Navigate in quick succession will not work. You have to Navigate to the first in your pageList and then parse it from a call in document complete and then Navigate to the next page.

JP Alioto
A: 

Have you taken a look at the Html Agility Pack ? This allows you to read/write the DOM using XPATH statements.

Here is the URL to the project on Codeplex.

Zachary