views:

24

answers:

1

I'm having a problem screenscraping some data from this website using the MSHTML COM component. I have a WebBrowser control on my WPF form. The code where I retrieve the HMTL elements is in the WebBrowser LoadCompleted events. After I set the values of the data to the HTMLInputElement and call the click method on the HTMLInputButtonElement, it is refusing to submit the the request and display the next page.

I analyse the HTML for the onclick attribute on the button, it is actually calling a JavaScript function and it processes my request. Which makes me not sure if calling the JavaScript function is causing the problem? But funny enough when I take my code out of the LoadCompleted method and put it inside a button click event it actually takes me to the next page where as the LoadCompleted method didn't do. Doing that sort of thing defeats the point of trying to screenscrape the page automatically.

On another thought: when I had the code inside the LoadCompleted method, I'm thinking the HTMLInputButtonElement is not fully rendered on to the page which result in click event not firing, despite the fact when I looked at the object in run time it is actually held the submit button element there and the state is saying I completed which baffles me even more.

Here is the code I used inside the LoadCompleted method and the click method on the button:

private void browser_LoadCompleted(object sender, NavigationEventArgs e)
{
    HTMLDocument dom = (HTMLDocument)browser.Document;
    IHTMLElementCollection elementCollection = dom.getElementsByName("PCL_NO_FROM.PARCEL_RANGE.XTRACKING.1-1-1.");
    HTMLInputElement inputBox = null;
    if (elementCollection.length > 0)
    {
        foreach (HTMLInputElement element in elementCollection)
        {
            if (element.name.Equals("PCL_NO_FROM.PARCEL_RANGE.XTRACKING.1-1-1."))
            {
                inputBox = element;
            }
        }
    }
    inputBox.value = "Test";

    elementCollection = dom.getElementsByName("SUBMIT.DUM_CONTROLS.XTRACKING.1-1.");
    HTMLInputButtonElement submitButton = null;
    if (elementCollection.length > 0)
    {
        foreach (HTMLInputButtonElement element in elementCollection)
        {
            if (element.name.Equals("SUBMIT.DUM_CONTROLS.XTRACKING.1-1."))
            {
                submitButton = element;
            }
        }
    }
    submitButton.click();
}

FYI: This is the URL of the web page I'm trying to access using MSHTML, http://track.dhl.co.uk/tracking/wrd/run/wt_xtrack_pw.entrypoint.

A: 

There are many possibilities:

  • You may try to put your code at other events, such as on Navigation Completed, or on Download Completed.

  • You may need to explicitly evaluate the OnClick event after the click() function.

  • Using the MS WebBrowser control is easier than using the MSHTML COM.

  • To make life easier, you may just use a webscraping library such as the IRobotSoft ActiveX control to automate your entire process.
seagulf