views:

245

answers:

2

I'm trying to download the contents of a web page using PHP. When I issue the command:

$f = file_get_contents("http://mobile.mybustracker.co.uk/mobile.php?searchMode=2");

It returns a page that reports that the server is down. Yet when I paste the same URL into my browser I get the expected page.

Does anyone have any idea what's causing this? Does file_get_contents transmit any headers that differentiate it from a browser request?

+5  A: 

Yes, there are differences -- the browser tends to send plenty of additionnal HTTP headers, I'd say ; and the ones that are sent by both probably don't have the same value.

Here, after doing a couple of tests, it seems that passing the HTTP header called Accept is necessary.

This can be done using the third parameter of file_get_contents, to specify additionnal context informations :

$opts = array('http' =>
    array(
        'method'  => 'GET',
        //'user_agent '  => "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2) Gecko/20100301 Ubuntu/9.10 (karmic) Firefox/3.6",
        'header' => array(
            'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*\/*;q=0.8
'
        ), 
    )
);
$context  = stream_context_create($opts);

$f = file_get_contents("http://mobile.mybustracker.co.uk/mobile.php?searchMode=2", false, $context);
echo $f;

With this, I'm able to get the HTML code of the page.


Notes :

  • I first tested passing the User-Agent, but it doesn't seem to be necessary -- which is why the corresponding line is here as a comment
  • The value is used for the Accept header is the one Firefox used when I requested that page with Firefox before trying with file_get_contents.
    • Some other values might be OK, but I didn't do any test to determine which value is the required one.


For more informations, you can take a look at :

Pascal MARTIN
A: 

replace all spaces with %20

jacob