views:

50

answers:

3

I am using curl and php to find out information about a given url (e.g. http status code, mimetype, http redirect location, page title etc).

  
 $ch = curl_init($url);
 $useragent="Mozilla/5.0 (X11; U; Linux x86_64; ga-GB) AppleWebKit/532.9 (KHTML, like Gecko) Chrome/5.0.307.11 Safari/532.9";
 curl_setopt($ch,CURLOPT_HTTPHEADER,array (
        "Accept: application/rdf+xml;q=0.9, application/json;q=0.6, application/xml;q=0.5, application/xhtml+xml;q=0.3, text/html;q=0.2, */*;q=0.1"
    ));
 curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
 curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
 curl_setopt($ch, CURLOPT_USERAGENT, $useragent); 
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 $content=curl_exec($ch);
 $chinfo = curl_getinfo($ch);
 curl_close($ch);

This generally works well. However, if the url points to a larger file then I get a fatal error:

Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate 14421576 bytes)

Is there anyway of preventing this? For example, by telling curl to give up if the file is too large, or by catching the error?

As a workaround, I've added

curl_setopt($ch, CURLOPT_TIMEOUT, 3); which assumes that any file that takes longer than 3 seconds to load will exhaust the allowed memory, but this is far from satisfactory.

A: 

Have you tried using CURLOPT_FILE to save the file directly to disk instead of using memory? You can even specify /dev/null to put it nowhere at all...

Or, you can use CURLOPT_WRITEFUNCTION to set a custom data-writing function. Have the function just scan the headers and then throw away the actual data.

Alternately, give PHP some more memory via php.ini.

Borealid
A: 

You need to edit the php.ini file and increase the allowed memory. Search for memory_limit in php.ini file.

memory_limit = 256M
NAVEED
A: 

If you're getting header information, then why not use a HEAD request? That avoids the memory usage of getting the whole page in a maximumn 16MiB memory slot.

curl_setopt($ch, CURLOPT_HEADER, true);

Then, for the page title, use file_get_contents() instead, as it's much better with its native memory allocation.

Delan Azabani
Originally I used this solution, but I found that some websites (e.g. Amazon) didn't accept HEAD requests.
lucas