views:

60

answers:

2

I'm trying to programmatically download some PDF document with a C# windows form application. Right now, I've got far enough to get a unique URL for each page that goes to download the PDF.

Each link is a webpage that submits a form via POST as soon as the page is loaded

function window_onload() {
                Form1.submit();
            }

Then the PDF starts downloading. I would like to stop the PDF from downloading and save it automatically to my local machine. The reason I want to do this is because there are around 15-20 PDFs that I need to download every week.

+1  A: 

I would use a httpwebrequest object.

depending on the size pdfs, and response time of servers you could do this asynchronously or synchronously. This is the synchronous flavor using the GetResponse() method.

void DoPDFDownload(string strMyUrl, string strPostData, string saveLocation)
{
    //create the request
    var wr = (HttpWebRequest)WebRequest.Create(myURL);
    wr.Method = "POST";
    wr.ContentLength = strPostData.Length;
    //Identify content type... strPostData should be url encoded per this next    
    //declaration
    wr.ContentType = "application/x-www-form-urlencoded";
    //just for good measure, set cookies if necessary for session management, etc.
    wr.CookieContainer = new CookieContainer();

    using(var sw = new StreamWriter(wr.GetRequestStream()))
    {
        sw.Write(strPostData);
    }

    var resp = wr.GetResponse();

    //feeling rather lazy at this point, but if you need further code for writing
    //response stream to a file stream, I can provide.
    //...

}

The following is a little method you could copy/paste into LINQPad to get an idea of how these classes work.

void DoSpeedTestDownloadFromFCC()
{

string strMyUrl = "http://data.fcc.gov/api/speedtest/find?latitude=38.0&longitude=-77.5&format=json";
    //create the request
    var wr = (HttpWebRequest)WebRequest.Create(strMyUrl);
    wr.ContentLength = strPostData.Length;
            //Note that I changed the method for the webservice's standard.
            //No Content type on GET requests.
    wr.Method = "GET";
    //just for good measure, set cookies if necessary for session management, etc.
    wr.CookieContainer = new CookieContainer();


    var resp = wr.GetResponse();

    //...
    using(StreamReader sr = new StreamReader(resp.GetResponseStream()))
    {
                    //here you would write the file to disk using your preferred method
                    //in linq pad, this just outputs the text to the console.
        sr.ReadToEnd().Dump();
    }

}
fauxtrot
What exactly is the strPostData? Also, could you help with the response stream to file stream?
blommer
strPostData is the data to be sent to the server in the form of a string. Something like "filename=my%20pdf%20file.pdf". http://en.wikipedia.org/wiki/Percent-encoding I'll have an edit momentarily with more information on saving the file to disk.
fauxtrot
Also note, this doesn't interrupt any browser function, but the web request 'acts' as a web browser. This class is very powerful, and can incorporate proxies.
fauxtrot