views:

36

answers:

3

I was looking at PHP docs for fsockopen and whatnot and they say you can't use filesize() on a remote file without doing some crazy things with ftell or something (not sure what they said exactly), but I had a good thought about how to do it:

$file = file_get_contents("http://www.google.com");
$filesize = mb_strlen($file) / 1000; //KBs, mb_* in case file contains unicode

Would this be a good method? It seemed so simple and good to use at the time, just want to get any thoughts if this could run into problems or not be the true file size.

I only wish to use this on text (websites) by the way not binary.

A: 

it will fetch the whole file and then calculate the filesize (rather the string length) out of the retrieved data. usually filesize can tell the filesize directly from the filesystem without reading the whole file first.

so this will be rather slow, and will everytime fetch the whole file before being able to retrieve the filesize (string length

knittl
Huh? if it's external it's not on your filesystem, and you'd need to use ftell etc. to read the bytes it's streaming through and calculate the size based off that. I was just mentioning if it would be more simple for text-based external objects.
John D.
+2  A: 

You should look at the get_headers() function. It will return a hash of HTTP headers from an HTTP request. The Content-length header may be a better judge of the size of the actual content, if it's present.

That being said, you really should use either curl or streams to do a HEAD request instead of a GET. Content-length should be present, which saves you the transfer. It will be both faster and more accurate.

Charles
Sounds good but Content-Length seems to not be on a single website I look at, so kinda iffy about that, although I may use that instead of my method if it exists.
John D.
It can vary depending on the type of content. If an actual physical file is being served, it should be present. If the content is compressed, it should be present. If the content is generated by a script, it might not be present...
Charles
+1  A: 

This answer requires PHP5 and cUrl. It first checks the headers. If Content-Length isn't specified, it uses cUrl to download it and check the size (the file is not saved anywhere though--just temporarily in memory).

<?php
echo get_remote_size("http://www.google.com/");

function get_remote_size($url) {
    $headers = get_headers($url, 1);
    if (isset($headers['Content-Length'])) return $headers['Content-Length'];
    if (isset($headers['Content-length'])) return $headers['Content-length'];

    $c = curl_init();
    curl_setopt_array($c, array(
        CURLOPT_URL => $url,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_HTTPHEADER => array('User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3'),
        ));
    curl_exec($c);
    return curl_getinfo($c, CURLINFO_SIZE_DOWNLOAD);
}
?>
Mark Eirich
You never got a point for this answer of yours - but it is gold!
Tal Galili