views:

26

answers:

2

suppose I want to automatically download a file from url that is located inside an authentication-required website into which I login using automated WebBrowser control based on Internet Explorer. But, once I am there and grab the link to the file, if I try to download it directly via IE6 by navigating to it, there will be the "do you want to open or save this file" modal dialog. And if I try to download it using C# WebClient class, it didn't work out, all that got downloaded was a short piece of non-meaningful javascript. In fact, out of curiosity I tested the WebClient method inside Gmail website trying to download attachments, and it didn't work either (I know that from Gmail I can grab them via POP3 interface, it was just an experiment).

Well, so this makes me wonder about the underlying mechanics of it all. First of all, maybe I am using WebClient in the wrong way? Or maybe there is some other standard C# class for downloading files in such circumstances?

If not, is it possible for the app to spoof the behavior of the browser so that the server would think that the request for file came from it, even though it actually comes from another part of the same process? What exactly is the browser doing in this situation that lets it download the files while WebClient cannot do so?

+1  A: 

This usually had to do with cookies, or other HTTP request headers that your browser sends. The web server can not distinguish b/w a human-driven web browser, or code-controlled "webclient" as long as they send exactly the same headers.

In a human-driven "session" authentication (entering username/password) usually causes some cookies being sent from Server to browser, and you keep being "logged on", as you browser keeps sending those cookies back to the server when making consequent requests.

So, if your webclient can send (post?) the credentials correctly, and keep on with storing and resending cookies (and/or "referrer"/"user-agent" headers) as needed, it shouldn't be any different (in the end it's just request, and response chain of HTT-Protocol).

There may be safe-guards in the particular "control" you're using to prevent it (or the API) from being used by malware though. "A program is trying to send e-mail on your behalf, are you sure you want to allow this?" prompt, and the accompanying 5-second delay in MS Outlook is such an example. So, if a particular API you're using has this kind of a prompt/precaution, you may not take care of things totally silently.

OzgurH
+2  A: 

If you ever want to understand the difference in what two network programs do, you have to look at the network traffic. Use Fiddler or something similar to see what each program is doing, then compare the two.

John Saunders