views:

2709

answers:

2

When controlling IE instance via MSHTML, how to suppress Open/Save dialogs for non-HTML content?

I need to get data from another system and import it into our one. Due to budget constraints no development (e.g. WS) can be done on the other side for some time, so my only option for now is to do web scrapping.

The remote site is ASP.NET-based, so simple HTML requests won't work -- too much JS.

I wrote a simple C# application that uses MSHTML and SHDocView to control an IE instance. So far so good: I can perform login, navigate to desired page, populate required fields and do submit.

Then I face a couple of problems:

First is that report is opening in another window. I suspect I can attach to that window too by enumerating IE windows in the system.

Second, more troublesome, is that report itself is CSV file, and triggers Open/Save dialog. I'd like to avoid it and make IE save the file into given location OR I'm fine with programmatically clicking dialog buttons too (how?)

I'm actually totally non-Windows guy (unix/J2EE), and hope someone with better knowledge would give me a hint how to do those tasks.

Thanks!

UPDATE

I've found a promising document on MSDN: http://msdn.microsoft.com/en-ca/library/aa770041.aspx

Control the kinds of content that are downloaded and what the WebBrowser Control does with them once they are downloaded. For example, you can prevent videos from playing, script from running, or new windows from opening when users click on links, or prevent Microsoft ActiveX controls from downloading or executing.

Slowly reading through...

UPDATE 2: MADE IT WORK, SORT OF...

Finally I made it work, but in an ugly way. Essentially, I register a handler "before navigate", then, in the handler, if the URL is matching my target file, I cancel the navigation, but remember the URL, and use WebClient class to access and download that temporal URL directly.

I cannot copy the whole code here, it contains a lot of garbage, but here are the essential parts:

Installing handler:

_IE2.FileDownload += new DWebBrowserEvents2_FileDownloadEventHandler(IE2_FileDownload);
_IE.BeforeNavigate2 += new DWebBrowserEvents2_BeforeNavigate2EventHandler(IE_OnBeforeNavigate2);

Recording URL and then cancelling download (thus preventing Save dialog to appear):

public string downloadUrl;

void IE_OnBeforeNavigate2(Object ob1, ref Object URL, ref Object Flags, ref Object Name, ref Object da, ref Object Head, ref bool Cancel)
{
    Console.WriteLine("Before Navigate2 "+URL);

    if (URL.ToString().EndsWith(".csv"))
    {
        Console.WriteLine("CSV file");
        downloadUrl = URL.ToString();
    }

    Cancel = false;
}

void IE2_FileDownload(bool activeDocument, ref bool cancel)
{
    Console.WriteLine("FileDownload, downloading "+downloadUrl+" instead");
    cancel = true;
}

    void IE_OnNewWindow2(ref Object o, ref bool cancel)
    {
        Console.WriteLine("OnNewWindow2");

        _IE2 = new SHDocVw.InternetExplorer();
        _IE2.BeforeNavigate2 += new DWebBrowserEvents2_BeforeNavigate2EventHandler(IE_OnBeforeNavigate2);
        _IE2.Visible = true;
        o = _IE2;

        _IE2.FileDownload += new DWebBrowserEvents2_FileDownloadEventHandler(IE2_FileDownload);

        _IE2.Silent = true;

        cancel = false;
        return;
    }

And in the calling code using the found URL for direct download:

        ...
        driver.ClickButton(".*_btnRunReport");
        driver.WaitForComplete();

        Thread.Sleep(10000);

        WebClient Client = new WebClient();
        Client.DownloadFile(driver.downloadUrl, "C:\\affinity.dump");

(driver is a simple wrapper over IE instance = _IE)

Hope that helps someone.

A: 

The easiest way to do this would be to adjust the MIME type for CSV files on the system that does the downloading. IE is trying to download the file because of the action associated with .CSV files.

I think you can change this in Windows Explorer by going to Tools-Folder Options-File Types. If you associate CSV files with Internet Explorer then the CSV file will open in IE. At that point you should be able to use IE automation to save the current open document to a file.

Dave Swersky
I cannot modify the remote system in any way. Can I change IE preferences programmatically for the duration of the process (not affecting user settings, that is)?
Vladimir Dyuzhev
You don't have to modify the remote system, you need to change the MIME type settnigs on the system doing the scraping.
Dave Swersky
Sorry, misread your first statement. Still, how do I change setting of IE from within my app? I don't want to change them permanently as the user (BA) downloads quite a lot of reports, and some of them are in CSV.
Vladimir Dyuzhev
A: 

You cannot fully contol how the browser behaves, because the end user can set his browser to always open a file with a certain content type.

simon