views:

368

answers:

3

I need to obtain delivery tracking details from the Canada Post website, which does not offer an API.

I've formulated a URL that when entered into a browser correctly returns the tracking information, but I can't get the request to function with CURL (it returns a 500 We're Sorry page).


class cURL { 
var $headers; 
var $user_agent; 
var $compression; 
var $cookie_file; 
var $proxy; 
function cURL($cookies=TRUE,$cookie='cookies.txt',$compression='gzip',$proxy='') { 
$this->headers[] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg'; 
$this->headers[] = 'Connection: Keep-Alive'; 
$this->headers[] = 'Content-type: application/x-www-form-urlencoded;charset=UTF-8'; 
$this->user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)'; 
$this->compression=$compression; 
$this->proxy=$proxy; 
$this->cookies=$cookies; 
if ($this->cookies == TRUE) $this->cookie($cookie); 
} 
function cookie($cookie_file) { 
if (file_exists($cookie_file)) { 
$this->cookie_file=$cookie_file; 
} else { 
fopen($cookie_file,'w') or $this->error('The cookie file could not be opened. Make sure this directory has the correct permissions'); 
$this->cookie_file=$cookie_file; 
fclose($this->cookie_file); 
} 
} 
function get($url) { 
$process = curl_init($url); 
curl_setopt($process, CURLOPT_HTTPHEADER, $this->headers); 
curl_setopt($process, CURLOPT_HEADER, 0); 
curl_setopt($process, CURLOPT_USERAGENT, $this->user_agent); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEFILE, $this->cookie_file); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEJAR, $this->cookie_file); 
curl_setopt($process,CURLOPT_ENCODING , $this->compression); 
curl_setopt($process, CURLOPT_TIMEOUT, 30); 
if ($this->proxy) curl_setopt($cUrl, CURLOPT_PROXY, 'proxy_ip:proxy_port'); 
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1); 
$return = curl_exec($process); 
curl_close($process); 
return $return; 
} 
function post($url,$data) { 
$process = curl_init($url); 
curl_setopt($process, CURLOPT_HTTPHEADER, $this->headers); 
curl_setopt($process, CURLOPT_HEADER, 1); 
curl_setopt($process, CURLOPT_USERAGENT, $this->user_agent); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEFILE, $this->cookie_file); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEJAR, $this->cookie_file); 
curl_setopt($process, CURLOPT_ENCODING , $this->compression); 
curl_setopt($process, CURLOPT_TIMEOUT, 30); 
if ($this->proxy) curl_setopt($process, CURLOPT_PROXY, $this->proxy); 
curl_setopt($process, CURLOPT_POSTFIELDS, $data); 
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($process, CURLOPT_POST, 1); 
$return = curl_exec($process); 
curl_close($process); 
return $return; 
} 
function error($error) { 
echo "cURL Error
$error"; die; } } $cc = new cURL(); $test = $cc->get('http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?trackingNumber=x0x0x0x0x0x0x0&trackingType=trackPersonal'); echo $test;

[UPDATE] after removing the Accept header line as per Tim's reply, I now get a page with the following 'You are currently visiting our Basic Site. This site is used for low-bandwidth connections, mobile devices and alternative browsers.' - but, again, no tracking information.

+1  A: 

I believe the problem is with this line:

$this->headers[] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg';

Add text/html and you should be good. Or just drop that header.

Tim Lytle
Thank you. I've removed that line and now get a page with following text 'You are currently visiting our Basic Site. This site is used for low-bandwidth connections, mobile devices and alternative browsers.' - again this does not display the tracking information. I'm guessing it might be to do with the user agent being sent?
BrynJ
It looks like the URL you're using is redirected - perhaps it's not a header redirect, but a JS or Meta based redirect. Just a guess
Tim Lytle
I take it that if that is the case there's nothing I can do to get around the issue?
BrynJ
From a quick look, it seems like the page you send get vars takes the data and POSTs it someplace. I'd take a look at the tracking form at the bottom of the 'mobile' page you get now (using CURL).
Tim Lytle
Well, low and behold, it now seems to be working great :) I'm honestly not sure what changed, but I now have the tracking info in the returned page. Thanks!
BrynJ
+1  A: 

I used Snoopy for screen scrapes. Totally recommended.

UPDATE: I could fetch that content using Snoopy (but I needed to modify a simple line: 809)

Here is my code:

<?php
    include('Snoopy.class.php');

    $http = new Snoopy();
    $http->fetch('http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?trackingNumber=x0x0x0x0x0x0x0&amp;trackingType=trackPersonal');

    echo $http->results;
?>

You need to download Snoopy library and modify the line 809:

$cookie_headers .= $cookieKey."=".urlencode($cookieVal)."; ";

with:

$cookie_headers .= $cookieKey."=".$cookieVal."; ";

And voilà!

inakiabt
The SF site doesn't seem to have much info - care to elaborate on what it provides?
Tim Lytle
I've updated my answer... ;)
inakiabt
thanks for the suggestion, I've upvoted, but in this instance I think snoopy is a bit heavyweight for my application.
BrynJ
A: 

How old is this thread? Canadapost certainly does offer an API. http://sellonline.canadapost.ca/DevelopersResources/

Dss