views:

359

answers:

2

The Situation

I am creating a video training site for a client on the RackSpace Cloud using the traditional LAMP stack (RackSpace's cloud has both Windows and LAMP stacks). The videos and other media files I'm serving on this site need to be protected as my client charges money for access to them. There is no DRM or funny business like that, essentially we store the files outside of the web root and use PHP to authenticate user's before they are able to access the files by using mod_rewrite to run the request through PHP.

So let's say the user requests a file at this URL:

http://www.example.com/uploads/preview_image/29.jpg

I am using mod_rewrite to rewrite that url to:

http://www.example.com/files.php?path=%2Fuploads%2Fpreview_image%2F29.jpg

Here is a simplified version of the files.php script:

<?php
// Setups the environment and sets $logged_in
// This part requires $_SESSION
require_once('../../includes/user_config.php');

if (!$logged_in) {
    // Redirect non-authenticated users
    header('Location: login.php');
}

// This user is authenticated, continue

$content_type = "image/jpeg";

// getAbsolutePathForRequestedResource() takes 
// a Query Parameter called path and uses DB
// lookups and some string manipulation to get
// an absolute path. This part doesn't have
// any bearing on the problem at hand
$file_path = getAbsolutePathForRequestedResource($_GET['path']);

// At this point $file_path looks something like
// this: "/path/to/a/place/outside/the/webroot"

if (file_exists($file_path) && !is_dir($file_path)) {
    header("Content-Type: $content_type");
    header('Content-Length: ' . filesize($file_path));
    echo file_get_contents($file_path);
} else {
    header('HTTP/1.0 404 Not Found'); 
    header('Status: 404 Not Found');
    echo '404 Not Found';
}
exit();

?>

The Problem

Let me start by saying this works perfectly for me. On local test machines it works like a charm. However once deployed to the cloud it stops working. After some debugging it turns out that if a request to the cloud has certain file extensions like .JPG, .PNG, or .SWF (i.e. extensions of typically static media files.) the request is routed to a cache system called Varnish. The end result of this routing is that by the time this whole process makes it to my PHP script the session is not present.

If I change the extension in the URL to .PHP or if I even add a query parameter Varnish is bypassed and the PHP script can get the session. No problem right? I'll just add a meaningless query parameter to my requests!

Here is the rub: The media files I am serving through this system are being requested through compiled SWF files that I have zero control over. They are generated by third-party software and I have no hope of adding or changing the URLs that they request.

Are there any other options I have on this?

Update: I should note that I have verified this behavior with RackSpace support and they have said there is nothing they can do about it.

+2  A: 

If the requesting flash app is following redirects, I would try to answer with a redirect on the first request and rewrite the second one, e.g.

GET .../29.jpg

to

header("Status: 302 Moved temporarily");
header("Location: .../r.php?i=29.jpg&random=872938729348");

Then your r.php delivers the file on the second request.

If not (btw. always), I would explicitly send headers along with delivering the static files that Varnish accepts and acts accordingly, something like

header("Cache-Control: no-cache, must-revalidate, max-age=0, post-check=0, pre-check=0");
header("Expires: Sat, 26 Jul 1997 05:00:00 GMT");

And: I would place the exit(); command after your first header() statement to be sure the rest of the script is not executed. header() sends just headers.

I find it also more reliable to use ob_start() as whitespace in your PHP file may lead to annoying errors when adding headers.

initall
Thanks for the suggestion! I got it working now!
macinjosh
+1 That is a great answer.
Byron Whitlock
A: 

I have the same situation, and I've contacted Rackspace hoping for a better answer.

I got one! They've put together a FAQ outlining half a dozen ways to bypass/modify the caching:

http://cloudsites.rackspacecloud.com/index.php/How_can_I_bypass_the_cache%3F

Cam
Ahhhh to follow up on my own comment, I've jointly worked through all of their solutions in that document, and the only one that works is adding a querystring to the end of the file. It's not a very good solution unless it's a one-off with a simple course or you built it yourself. If you have a ton of courses to convert, your content is user-submitted, or you don't have access to the course code, there's a good chance that solution may not work for you.Apparently this problem also affect the ability to host MediaWiki on Rackspace.
Cam