views:

1550

answers:

2

I've been working on a WebCrawler written in C# using System.Windows.Forms.WebBrowser. I am trying to download a file off a website and save it on a local machine. More importantly, I would like this to be fully automated. The file download can be started by clicking a button that calls a javascript function that sparks the download displaying a “Do you want to open or save this file?” dialog. I definitely do not want to be manually clicking “Save as”, and typing in the file name.

I am aware of HttpWebRequest and WebClient’s download functions, but since the download is started with a javascript, I do now know the url of the file. Fyi, the javascript is a doPostBack function that changes some values and submits a form.

I’ve tried getting focus on the save as dialog from WebBrowser to automate it from in there without much success. I know there’s a way to force the download to save instead of asking to save or open by adding a header to the http request, but I don’t know how to specify the filepath to download to.

Any thoughts would be greatly appreciated.

Thanks, Sharath

A: 

I think you should prevent the download dialog from even showing. Here might be a way to do that:

  • The Javascript code causes your WebBrowser control to navigate to a specific Url (what would cause the download dialog to appear)

  • To prevent the WebBrowser control from actually Navigating to this Url, attach a event handler to the Navigating event.

  • In your Navigating event you'd have to analyze if this is the actual Navigation action you'd want to stop (is this one the download url, perhaps check for a file extension, there must be a recognizable format). Use the WebBrowserNavigatingEventArgs.Url to do so.

  • If this is the right Url, stop the Navigation by setting the WebBrowserNavigatingEventArgs.Cancel property.

  • Continue the download yourself with the HttpWebRequest or WebClient classes

Have a look at this page for more info on the event:
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.navigating.aspx

Zyphrax
I've already tried getting the url using an HttpDebugger to look at the http request and responses. The url is exactly the same, one being a GET request, the other being a POST request. I also just tried your suggestion without luck.
Sharath
You might want to use the WebBrowser control to get to the very end, just before the form would be submitted and then extract the POST destination of the form using DOM (get a reference to the HTML document body and from there make your way to the form).
Zyphrax
A: 

similar solution is available at http://social.msdn.microsoft.com/Forums/en/csharpgeneral/thread/d338a2c8-96df-4cb0-b8be-c5fbdd7c9202/?prof=required

This work perfectly if there is direct URL including downloading file-name.

But sometime some URL generate file dynamically. So URL don't have file name but after requesting that URL some website create file dynamically and then open/save dialog comes.

for example some link generate pdf file on the fly.

How to handle such type of URL?

Vikram Gehlot