My client wanted a way to offer downloads to users, but only after they fill out a registration form (basically name and email). An email is sent to the user with the links for the downloadable content. The links contain a registration hash unique to the package, file, and user, and they actually go to a PHP page that logs each download and pushes the file out by writing it to stdout (along with the appropriate headers. This solution has inherent flaws, but this is how they wanted to do it. It needs to be said that I pushed them hard to 1.) limit the sizes of the downloadable files and 2.) think about using a CDN (they have international customers but are hosted in the US on 2 mirrored servers and a load balancer that uses sticky IPs). Anyway, it "works for me" but some of their international customers are on really slow connections (d/l rates of ~60kB/sec) and some of these files are pretty big (150 MB). Since this is a PHP script that is serving these files, it is bound by the script timeout setting. At first I had set this to 300 seconds (5 minutes), but this was not enough time for some of the beta users. So then I tried calculating the script timeout based on the size of the file divided by a 100kb/sec connection, but some of these users are even slower than that.
Now the client wants to just up the timeout value. I don't want to remove the timeout all together in case the script somehow gets into an infinite loop. I also don't want to keep pushing out the timeout arbitrarily for some catch-all lowest-common-denominator connection rate (most people are downloading much faster than 100kb/sec). And I also want to be able to tell the client at some point "Look, these files are too big to process this way. You are affecting the performance of the rest of the website with these 40-plus minute connections. We either need to rethink how they are delivered or use much smaller files."
I have a couple of solutions in mind, which are as follows:
- CDN - move the files to a CDN service such as Amazon's or Google's. We can still log the download attempts via the PHP file, but then redirect the browser to the real file. One drawback with this is that a user could bypass the script and download directly from the CDN once they have the URL (which could be gleaned by watching the HTTP headers). This isn't bad, but it's not desired.
- Expand the server farm - Expand the server farm from 2 to 4+ servers and remove the sticky IP rule from the load balancer. Downside: these are Windows servers so they are expensive. There is no reason why they couldn't be Linux boxes, but setting up all new boxes could take more time than the client would allow.
- Setup 2 new servers strictly for serving these downloads - Basically the same benefits and drawbacks as #2, except that we could at least isolate the rest of the website from (and fine tune the new servers to) this particular process. We could also pretty easily make these Linux boxes.
- Detect the users connection rate - I had in mind a way to detect the current speed of the user by using AJAX on the download landing page to time how long it takes to downloading a static file with a known file size, then sending that info to the server and calculating the timeout based on that info. It's not ideal, but it's better than estimating the connection speed too high or too low. I'm not sure how I would get the speed info back to the server though since we currently use a redirect header that is sent from the server.
Chances are #'s 1-3 will be declined or at least pushed off. So is 4 a good way to go about this, or is there something else I haven't considered?
(Feel free to challenge the original solution.)