tags:

views:

85

answers:

2
<?
$request_url = 'http://www.betjamaica.com/livelines2008/lines.asmx/Load_Latest_Lines?SportType=Football&amp;SportSubType=NFL&amp;GameType=GAME';
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $request_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
print "<textarea rows='10' cols='80'>";
print htmlentities($data);
print "</textarea>";
exit();
?>

This produces no results in the textarea but there should be. Other feeds work fine.

+3  A: 

You have a typo there. Try

print htmlentities($data);

instead of

print htmlentities($date);

Also, it would be advisable to turn error_reporting to such a level that informs you if you try to use variables that do not exist. You can do that by using:

error_reporting(E_ERROR | E_WARNING | E_PARSE | E_NOTICE);

E_NOTICE error reporting level will give you a warning for using uninitialized variables.

Tatu Ulmanen
Tatu, you mean a typ_o_. :)
Pekka
Yup, can't believe I typoed 'typo' :)
Tatu Ulmanen
fixed but still not retrieving anything
justin
How could you make the statement that other feeds worked fine when they obviously couldn't b/c you were outputting the wrong variable?
Mike B
A: 

Here's a walkthrough of how I debug page scraping issues with cURL:

  1. Try the URL in a browser (with LiveHTTPHeaders), and in cURL with CURLOPT_VERBOSE enabled. This serves two purposes: revealing the HTTP headers in play, and it serves as a simple test of the URL itself.
  2. If it works in the browser, but not in cURL, work with cURL until the HTTP headers cURL emits match the browser.

Let's try this with your example.

The URL you provided works in a browser, however...

Turning on CURLOPT_VERBOSE reveals the following:

* About to connect() to www.betjamaica.com port 80
*   Trying 72.52.5.34... * connected
* Connected to www.betjamaica.com (72.52.5.34) port 80
> GET /livelines2008/lines.asmx/Load_Latest_Lines?SportType=Football&SportSubType=NFL&GameType=GAME HTTP/1.1
Host: www.betjamaica.com
Accept: */*

* Empty reply from server
* Connection #0 to host www.betjamaica.com left intact
* Closing connection #0

The server's not replying. The only difference between the browser request, and the cURL request, is the browser sends more headers. So, the thing to do is experiment with adding browser headers until it starts working. If you copy all headers your browser sends, the request should be identical, and as a result, functional.

Here, I've simply copied and pasted my Firefox request headers into the PHP:

$request_url =
'http://www.betjamaica.com/livelines2008/lines.asmx/Load_Latest_Lines?SportType=Football&amp;SportSubType=NFL&amp;GameType=GAME';
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $request_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_VERBOSE, true);
$headers = array(
'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.16) Gecko/2009120208 Firefox/3.0.16',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-us,en;q=0.5',
'Accept-Encoding: gzip,deflate',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Keep-Alive: 300',
'Connection: keep-alive',
);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$data = curl_exec($ch);
curl_close($ch);
var_dump($data);

And it works. A little more experimentation reveals that all headers other than User-Agent can be removed:

$headers = array(
'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.16)',
);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

And there you are: apparently, this IIS server is refusing to serve any requests without a User-Agent. Add one, and you're good to go.

Frank Farmer