views:

378

answers:

5

I know that file_get_contents can be used to retrieve the source of a webpage, but I want to know the most efficient way.

I have an old class I made a long time ago that uses something like this:

    $this->socket = fsockopen($this->host, 80);

 fputs($this->socket, 'GET ' . $this->target . ' HTTP/1.0' . "\n");
 fputs($this->socket, 'Host: ' . $this->host . "\n"); 
 fputs($this->socket, 'User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b5) Gecko/2008050509 Firefox/3.0b5' . "\n");
 fputs($this->socket, 'Connection: close' . "\n\n");

 $this->source = '';

 while(!feof($this->socket))
 {
  $this->source .= fgets($this->socket, 128);
 }

 fclose($this->socket);

Is this the best way? By most efficient I mean returns the fastest results.

+3  A: 

The code you have is probably the fastest and simplest way of doing what you're talking about. However, it isn't very flexible if you want to do more complex tasks (like posting, or supporting HTTP 1.1 stuff like Content-Encoding and Transfer-Encoding).

If you want something that will handle more complex cases and such, use php cURL.

SoapBox
+4  A: 

file_get_contents() is the best and most efficient way. But, either way, there is not much difference because the bottleneck is the network, not the processor. Code readability should also be a concern.

Consider this benchmark as well: http://www.ebrueggeman.com/php_benchmarking_fopen.php

carl
A: 

Check also Zend Framework's Zend_Http_Client class. It supports redirects etc.

raspi
A: 

You won't get better performance than the built-in file_get_contents with homebrew code like this. Indeed, the constant concatenation on strings as short as 128 bytes (? why?) will perform rather badly.

For HTTP there are reasons to Do It Yourself or use an external library, for example:

  • you need control over network timeouts

  • you want to stream content directly from the socket instead of accumulating it

but performance isn't one of them; the simple built-in PHP function will be limited only by the network speed, which is something you can't do anything about.

bobince
+1  A: 

Not sure? Let's test! The script below opens up example.org 10 times using both methods:

$t = microtime(true);
$array = array();
for($i = 0; $i < 10; $i++) {
    $source = file_get_contents('http://www.example.org');
}
print microtime(true) - $t;
print '<br>';
$t = microtime(true);
$array = array();
for($i = 0; $i < 10; $i++) {
    $socket = fsockopen('www.example.org', 80);
    fputs($socket, 'GET / HTTP/1.0' . "\n");
    fputs($socket, 'Host: www.example.org' . "\n"); 
    fputs($socket, 'User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b5) Gecko/2008050509 Firefox/3.0b5' . "\n");
    fputs($socket, 'Connection: close' . "\n\n");
    $source = '';
    while(!feof($socket)) {
        $source .= fgets($socket, 128);
    }
    fclose($socket);
}
print microtime(true) - $t;

1st run:

file_get_contents: 3.4470698833466
fsockopen: 6.3937518596649

2nd run:

file_get_contents: 3.5667569637299
fsockopen: 6.4959270954132

3rd run

file_get_contents: 3.4623680114746
fsockopen: 6.4249370098114

So since file_get_contents is faster and more concise I'm going to declare it the winner!

Paolo Bergantino