views:

570

answers:

7

I'm trying to put together a function that receives a file path, identifies what it is, sets the appropriate headers, and serves it just like Apache would.

The reason I am doing this is because I need to use PHP to process some information about the request before serving the file.

Speed is critical

virtual() isn't an option

Must work in a shared hosting environment where the user has no control of the web server (Apache/nginx, etc)

Here's what I've got so far:

File::output($path);

<?php
class File {
static function output($path) {
    // Check if the file exists
    if(!File::exists($path)) {
        header('HTTP/1.0 404 Not Found');
        exit();
    }

    // Set the content-type header
    header('Content-Type: '.File::mimeType($path));

    // Handle caching
    $fileModificationTime = gmdate('D, d M Y H:i:s', File::modificationTime($path)).' GMT';
    $headers = getallheaders();
    if(isset($headers['If-Modified-Since']) && $headers['If-Modified-Since'] == $fileModificationTime) {
        header('HTTP/1.1 304 Not Modified');
        exit();
    }
    header('Last-Modified: '.$fileModificationTime);

    // Read the file
    readfile($path);

    exit();
}

static function mimeType($path) {
    preg_match("|\.([a-z0-9]{2,4})$|i", $path, $fileSuffix);

    switch(strtolower($fileSuffix[1])) {
        case 'js' :
            return 'application/x-javascript';
        case 'json' :
            return 'application/json';
        case 'jpg' :
        case 'jpeg' :
        case 'jpe' :
            return 'image/jpg';
        case 'png' :
        case 'gif' :
        case 'bmp' :
        case 'tiff' :
            return 'image/'.strtolower($fileSuffix[1]);
        case 'css' :
            return 'text/css';
        case 'xml' :
            return 'application/xml';
        case 'doc' :
        case 'docx' :
            return 'application/msword';
        case 'xls' :
        case 'xlt' :
        case 'xlm' :
        case 'xld' :
        case 'xla' :
        case 'xlc' :
        case 'xlw' :
        case 'xll' :
            return 'application/vnd.ms-excel';
        case 'ppt' :
        case 'pps' :
            return 'application/vnd.ms-powerpoint';
        case 'rtf' :
            return 'application/rtf';
        case 'pdf' :
            return 'application/pdf';
        case 'html' :
        case 'htm' :
        case 'php' :
            return 'text/html';
        case 'txt' :
            return 'text/plain';
        case 'mpeg' :
        case 'mpg' :
        case 'mpe' :
            return 'video/mpeg';
        case 'mp3' :
            return 'audio/mpeg3';
        case 'wav' :
            return 'audio/wav';
        case 'aiff' :
        case 'aif' :
            return 'audio/aiff';
        case 'avi' :
            return 'video/msvideo';
        case 'wmv' :
            return 'video/x-ms-wmv';
        case 'mov' :
            return 'video/quicktime';
        case 'zip' :
            return 'application/zip';
        case 'tar' :
            return 'application/x-tar';
        case 'swf' :
            return 'application/x-shockwave-flash';
        default :
            if(function_exists('mime_content_type')) {
                $fileSuffix = mime_content_type($path);
            }
            return 'unknown/' . trim($fileSuffix[0], '.');
    }
}
}
?>
+25  A: 

The fastest way: Don't. Look into the x-sendfile header for nginx, there are similar things for other web servers also. This means that you can still do access control etc in php but delegate the actual sending of the file to a web server designed for that.

P.S: I get chills just thinking about how much more efficient using this with nginx is, compared to reading and sending the file in php. Just think if 100 people are downloading a file: With php + apache, being generous, thats probably 100*15mb = 1.5GB (approx, shoot me), of ram right there. Nginx will just hand off sending the file to the kernel, and then it's loaded directly from the disk into the network buffers. Speedy!

P.P.S: And, with this method you can still do all the access control, database stuff you want.

Jords
Let me just add that this also exists for Apache: http://www.jasny.net/articles/how-i-php-x-sendfile/ . You can make the script sniff out the server and send the appropriate headers. If none exist (and the user has no control over the server as per the question), fall back to a normal `readfile()`
Fanis
Now this is just awesome - I always hated bumping up the memory limit in my virtual hosts just so that PHP would serve up a file, and with this I shouldn't have to. I'll be trying it out very soon.
Greg W
And for credit where credit is due, [Lighttpd](http://www.lighttpd.net/) was the first web server to implement this (And the rest copied it, which is fine since it's a great idea. But give credit where credit is due)...
ircmaxell
This answer keeps getting upvoted, but it won't work in an environment where the web server and its settings are out of the user's control.
Kirk
You actually added that to your question after I posted this answer. And if performance is an issue, then the web server has to be within your control.
Jords
A: 

Try this:

function file_type($file) { 
    if(!preg_match("/(?:https?|ftp):\/\//i",$file)) { 
             $protocol = explode("/", $_SERVER['SERVER_PROTOCOL']);
             $protocol = strtolower($protocol[0]);
             $domain = $protocol."://".$_SERVER['HTTP_HOST']."/";
                 $file = $domain.$file;

    }
         $gh = get_headers($file,true);
            return $gh["Content-Type"];
   }

   echo file_type("foo.txt");
Jet
+10  A: 
header('Location: ' . $path);
exit(0);

Let Apache do the work for you.

amphetamachine
That's simpler than the x-sendfile method, but will not work to restrict access to a file, to say only logged in people. If you don't need to do that then it's great!
Jords
Also add a referrer check with mod_rewrite.
sanmai
You could auth before passing the header. That way you're also not pumping tons of stuff through PHP's memory.
UltimateBrent
+8  A: 

My previous answer was partial and not well documented, here is an update with a summary of the solutions from it and from others in the discussion.

The solutions are ordered from best solution to worst but also from the solution needing the most control over the web server to the one needing the less. There don't seem to be an easy way to have one solution that is both fast and work everywhere.


Using the X-SendFile header

As documented by others it's actually the best way. The basis is that you do your access control in php and then instead of sending the file yourself you tell the web server to do it.

The basic php code is :

header("X-Sendfile: $file_name");
header("Content-type: application/octet-stream");
header('Content-Disposition: attachment; filename="' . basename($file_name) . '"');

Where $file_name is the full path on the file system.

The main problem with this solution is that it need to be allowed by the web server and either isn't installed by default (apache), isn't active by default (lighttpd) or need a specific configuration (nginx).

Apache

Under apache if you use mod_php you need to install a module called mod_xsendfile then configure it (either in apache config or .htaccess if you allow it)

XSendFile on
XSendFilePath /home/www/example.com/htdocs/files/

With this module the file path could either be absolute or relative to the specified XSendFilePath.

Lighttpd

The mod_fastcgi support this when configured with

"allow-x-send-file" => "enable" 

The documentation for the feature is on the lighttpd wiki they document the X-LIGHTTPD-send-file header but the X-Sendfile name also work

Nginx

On Nginx you can't use the X-Sendfile header you must use their own header that is named X-Accel-Redirect. It is enabled by default and the only real difference is that it's argument should be an URI not a file system. The consequence is that you must define a location marked as internal in your configuration to avoid clients finding the real file url and going directly to it, their wiki contains a good explanation of this.

Symlinks and Location header

You could use symlinks and redirect to them, just create symlinks to your file with random names when an user is authorized to access a file and redirect the user to it using:

header("Location: " . $url_of_symlink);

Obviously you'll need a way to prune them either when the script to create them is called or via cron (on the machine if you have access or via some webcron service otherwise)

Under apache you need to be able to enable FollowSymLinks in a .htaccess or in the apache config.

Access control by IP and Location header

Another hack is to generate apache access files from php allowing the explicit user IP. Under apache it mean using mod_authz_host (mod_access) Allow from commands.

The problem is that locking access to the file (as multiple users may want to do this at the same time) is non trivial and could lead to some users waiting a long time. And you still need to prune the file anyway.

Obviously another problem would be that multiple people behind the same IP could potentially access the file.

When everything else fail

If you really don't have any way to get your web server to help you, the only solution remaining is readfile it available in all php versions currently in use and work pretty well (but isn't really efficient).


Combining solutions

In fine, the best way to send a file really fast if you want your php code to be usable everywhere is to have a configurable option somewhere, with instructions on how to activate it depending on the web server and maybe an auto detection in your install script.

It is pretty similar to what is done in a lot of software for

  • Clean urls (mod_rewrite on apache)
  • Crypto functions (mcrypt php module)
  • Multibyte string support (mbstring php module)
VirtualBlackFox
Is there any problem with doing some PHP works (check cookie/other GET/POST params against database) before doing `header("Location: " . $path);`?
afriza
No problem for such action, the thing you need to be careful with are sending content (print, echo) as the header must come before any content and doing things after sending this header, it is not an immediate redirection and code after it will be executed most of the time but you have no guaranties that the browser won't cut the connection.
VirtualBlackFox
Jords: I didn't know that apache also supported this, i'll add this to my answer when i have time. The only problem with it is that i isn't unified (X-Accel-Redirect nginx for example) so a second solution is needed if the server either don't support it. But i should add it to my answer.
VirtualBlackFox
Answer edited to add most of the possible solutions.
VirtualBlackFox
A: 

If you wish to hide where the file is located and people with specific privilege may download the file then it is a good idea to use PHP as relay, and you have to sacrifice some CPU time to gain more security and control.

Bandpay
A: 

if you have the possibility to add PECL extensions to your php you can simply use the functions from the Fileinfo package to determine the content-type and then send the proper headers...

zolex
/bump, have you mentioned this possibility? :)
zolex
A: 

Have the web server provide it. Use php to process the inputs and use a redirect

Jay