Here's a walkthrough of how I debug page scraping issues with cURL:
- Try the URL in a browser (with
LiveHTTPHeaders), and in cURL with
CURLOPT_VERBOSE enabled. This serves two purposes: revealing the HTTP headers in play, and it serves as a simple test of the URL itself.
- If it works in the browser, but not in cURL, work with cURL until the HTTP headers cURL emits match the browser.
Let's try this with your example.
The URL you provided works in a browser, however...
Turning on CURLOPT_VERBOSE reveals the following:
* About to connect() to www.betjamaica.com port 80
* Trying 72.52.5.34... * connected
* Connected to www.betjamaica.com (72.52.5.34) port 80
> GET /livelines2008/lines.asmx/Load_Latest_Lines?SportType=Football&SportSubType=NFL&GameType=GAME HTTP/1.1
Host: www.betjamaica.com
Accept: */*
* Empty reply from server
* Connection #0 to host www.betjamaica.com left intact
* Closing connection #0
The server's not replying. The only difference between the browser request, and the cURL request, is the browser sends more headers. So, the thing to do is experiment with adding browser headers until it starts working. If you copy all headers your browser sends, the request should be identical, and as a result, functional.
Here, I've simply copied and pasted my Firefox request headers into the PHP:
$request_url =
'http://www.betjamaica.com/livelines2008/lines.asmx/Load_Latest_Lines?SportType=Football&SportSubType=NFL&GameType=GAME';
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $request_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_VERBOSE, true);
$headers = array(
'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.16) Gecko/2009120208 Firefox/3.0.16',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-us,en;q=0.5',
'Accept-Encoding: gzip,deflate',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Keep-Alive: 300',
'Connection: keep-alive',
);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$data = curl_exec($ch);
curl_close($ch);
var_dump($data);
And it works. A little more experimentation reveals that all headers other than User-Agent can be removed:
$headers = array(
'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.16)',
);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
And there you are: apparently, this IIS server is refusing to serve any requests without a User-Agent. Add one, and you're good to go.