views:

369

answers:

1

I first posted this: HttpWebRequest: How to find a postal code at Canada Post through a WebRequest with x-www-form-enclosed?.

Following AnthonyWJones suggestions, I changed my code following his suggestions.

On a continuation of my inquiry, I have noticed with time that the content-type of Canada Post is more likely to be "application/xhtml+xml, text/xml, text/html; charset=utf-8".

My questions are:

  1. How do we webrequest against such a content-type website?
  2. Do we have to keep on going with the NameValueCollection object?
  3. According to Scott Lance who generously provided me with precious information within my preceding question, the WebRequest shall return the type of information whatever the content-type may be, am I missing something here?
  4. Do I have to change my code because of the content-type change?

Here is my code so that it might be easier to understand my progress.

internal class PostalServicesFactory {
/// <summary>
/// Initializes an instance of GI.BusinessSolutions.Services.PostalServices.Types.PostalServicesFactory class.
/// </summary>
internal PostalServicesFactory() {
}
/// <summary>
/// Finds a Canadian postal code for the provided Canadian address.
/// </summary>
/// <param name="address">The instance of GI.BusinessSolutions.Services.PostalServices.ICanadianCityAddress for which to find the postal code.</param>
/// <returns>The postal code found, otherwise null.</returns>
internal string FindPostalCode(ICanadianCityAddress address) {
    if (address == null)
        throw new InvalidOperationException("No valid address specified.");

    using (ServicesWebClient swc = new ServicesWebClient()) {
        var values = new System.Collections.Specialized.NameValueCollection();

        values.Add("streetNumber", address.StreetNumber.ToString());
        values.Add("numberSuffix", address.NumberSuffix);
        values.Add("suite", address.Suite);
        values.Add("streetName", address.StreetName);
        values.Add("streetDirection", address.StreetDirection);
        values.Add("city", address.City);
        values.Add("province", address.Province);

        byte[] resultData = swc.UploadValues(@"http://www.canadapost.ca/cpotools/apps/fpc/personal/findByCity", "POST", values);

        return Encoding.UTF8.GetString(resultData);
    }
}

private class ServicesWebClient : WebClient {
    public ServicesWebClient()
        : base() {
    }
    protected override WebRequest GetWebRequest(Uri address) {
        var request = (HttpWebRequest)base.GetWebRequest(address);
        request.CookieContainer = new CookieContainer();
        return request;
    }
}
}

This code actually returns the HTML source code of the form one must fill with the required information in order to process with the postal code search. What I wish is to get the HTML source code or whatever it may be with the found postal code.

EDIT: Here's the WebException I get now: "Unable to send a content body with this type of verb." (This is a translation from the French exception "Impossible d'envoyer un corps de contenu avec ce type de verbe.")

Here's my code:

    internal string FindPostalCode(string url, ICanadianAddress address) {
    string htmlResult = null;

    using (var swc = new ServiceWebClient()) {
        var values = new System.Collections.Specialized.NameValueCollection();

        values.Add("streetNumber", address.StreetNumber.ToString());
        values.Add("numberSuffix", address.NumberSuffix);
        values.Add("suite", address.Suite);
        values.Add("streetName", address.StreetName);
        values.Add("streetDirection", address.StreetDirection);
        values.Add("city", address.City);
        values.Add("province", address.Province);

        swc.UploadValues(url, @"POST", values);
        string redirectUrl = swc.ResponseHeaders.GetValues(@"Location")[0];
        => swc.UploadValues(redirectUrl, @"GET", values);
    }

    return htmlResult;
}

The line that causes the exception is pointed with "=>". It seems that I can't use GET as the method, yet this is what has been told me me to do...

Any idea what I'm missing here? I try to do what Justin (see answer) recommended me to do.

Thanks in advance for any help! :-)

+2  A: 

As an introduction to the world of screen-scraping, you've picked a very hard case! Canada post's lookup page works like this:

  1. the first page is a form which accepts the address values
  2. this page POSTs to a second URL.
  3. that second URL in turn redirects (using an HTTP 302 redirect) to a third URL which actually shows you the HTML response containing the postal code.

Making matters worse, the page in step #3 needs to know the cookie set in step #1. So you need to use the same CookieContainer for all three requests (although it may possibly be sufficient to send the same CookieContainer to #2 and #3 only).

Furthermore, you may need to send additional HTTP headers in these requests as well, like Accept. I suspect where you're running into problems is that HttpWebRequest by default handles redirect transparently for you-- but when it transparently redirects it may not add the right HTTP headers necessary to impersonate a browser.

The solution is to set the HttpWebRequest's AllowAutoRedirect property to false, and handle the redirect yourself. In other words, once the first request returns a redirection, you'll need to pull out the URL in the HttpWebResponse's Location: header. Then you'll need to create a new HttpWebRequest (this time a regular GET request, not a POST) for that URL. Remeber to send the same cookie! (the CookieContainer class makes this very easy)

You also may need to make an additional request (#1 in my list above) in order to set up the session cookie. If I were you, I'd assume that this is required, simply to eliminate it as a problem, and try removing that step later and see if your solution still works.

You'll want to download and use Fiddler (www.fiddlertool.com) to help you with all this. Fiddler allows you to watch the HTTP requests going over the wire, and allows you (via the request builder feature) allows you to create HTTP requests so you can see which headers are actually required.

Justin Grant
@Justin Grant: Thank you very much for this information. I downloaded and installed Fiddler already as suggested by AnthonyWJones and EricLaw-MSFT. I just don't understand all these things with the headers and everything. I knew about the redirections, but I didn't know how to process in my specific case. Your cues will for sure help me knowing now that I had better get to handle the redicrections myself. Your answer is more than complete. I hope I will be able to get straight down to my solution with it. If you don't mind, please come back to see if I posted anything else, question, comment..
Will Marcouiller
@Justin Grant: As for the second and third request, do I simply have to launch another request from the response URL using the same instance of ServicesWebClient I have coded? How do I get this response URL so that I may upload data again to this address in order to perform the second request?
Will Marcouiller
Hi Will - there are two possibilities. If the URLs are always the same (which I think is the case from the Canada Post site), then you can just hard-code the URLs for each step in your code. If they are different each time, then you'll need to screen-scrape the HTML to find the URLs. You can use the same WebClient instance or a different one. Just make sure that you're setting the cookie and headers correctly for each step.
Justin Grant
Thanks Justin! I will try as you say. I am very grateful for your help. =)
Will Marcouiller
@Justin Grant: Your guidelines have really helped! I saw the redirection, etc. and have been able to better understand the headers. Well, I only know one: "Location", but that's one more than none! That said, I progressed because of you. Thanks! I'm sure your answer is THE ONE leading to the solution. I only can't get all of it. ;)
Will Marcouiller
@Justin: thanks for the extra info. Looking to implement a solution for this problem myself.
p.campbell