On a regular basis I have to do the following manually in a web browser:
- Go to an https website.
- Logon on a webform.
- Click a link to download a large file (135MB).
I would like to automate this process using .NET.
Some days ago I posted this question here. Thanks to a piece of code by Rubens Farias I am now able to perform the above steps 1 and 2. After step 2 I am able to read the HTML of the page that contains the URL to the file to be downloaded (using afterLoginPage = reader.ReadToEnd()). This page only shows up if the login is granted, so step 2 is verified to be successful.
My question is now how of course how to perform step 3. I have tried some things, but to no avail, access to the file was denied despite of the successful previous login.
To clarify things I will post the code below, of course without the actual login information and websites. At the end, variable afterLoginPage contains the HTML of the post-login page, containing the link to the file I'd like to download. This link also starts with https obviously.
Dim httpsSite As String = "https://www.test.test/user/login"
' enter correct address
Dim formPage As String = ""
Dim afterLoginPage As String = ""
' Get postback data and cookies
Dim cookies As New CookieContainer()
Dim getRequest As HttpWebRequest = DirectCast(WebRequest.Create(httpsSite), HttpWebRequest)
getRequest.CookieContainer = cookies
getRequest.Method = "GET"
Dim wp As WebProxy = New WebProxy("[our proxies IP address]", [our proxies port number])
wp.Credentials = CredentialCache.DefaultCredentials
getRequest.Proxy = wp
Dim form As HttpWebResponse = DirectCast(getRequest.GetResponse(), HttpWebResponse)
Using response As New StreamReader(form.GetResponseStream(), Encoding.UTF8)
formPage = response.ReadToEnd()
End Using
Dim inputs As New Dictionary(Of String, String)()
inputs.Add("form_build_id", "[some code I'd like to keep secret]")
inputs.Add("form_id", "user_login")
For Each input As Match In Regex.Matches(formPage, "<input.*?name=""(?<name>.*?)"".*?(?:value=""(?<value>.*?)"".*?)? />", RegexOptions.IgnoreCase Or RegexOptions.ECMAScript)
If input.Groups("name").Value <> "form_build_id" And _
input.Groups("name").Value <> "form_id" Then
inputs.Add(input.Groups("name").Value, input.Groups("value").Value)
End If
Next
inputs("name") = "[our login name]"
inputs("pass") = "[our login password]"
Dim buffer As Byte() = Encoding.UTF8.GetBytes( _
[String].Join("&", _
Array.ConvertAll(Of KeyValuePair(Of String, String), String)(inputs.ToArray(), _
Function(item As KeyValuePair(Of String, String)) (item.Key & "=") + System.Web.HttpUtility.UrlEncode(item.Value))))
Dim postRequest As HttpWebRequest = DirectCast(WebRequest.Create(httpsSite), HttpWebRequest)
postRequest.CookieContainer = cookies
postRequest.Method = "POST"
postRequest.ContentType = "application/x-www-form-urlencoded"
postRequest.Proxy = wp
' send username/password
Using stream As Stream = postRequest.GetRequestStream()
stream.Write(buffer, 0, buffer.Length)
End Using
' get response from login page
Using reader As New StreamReader(postRequest.GetResponse().GetResponseStream(), Encoding.UTF8)
afterLoginPage = reader.ReadToEnd()
End Using