ansaurus

Question

What is the best way to handle this: large download via PHP + slow connection from client = script timeout before file is completely downloaded

Answer 1

+1 A:

The easy solution would be to disable the timeout. You can do this on a per-request basis with:

set_time_limit(0);

If your script is not buggy, this shouldn't be problem – unless your server is not able to handle so many concurrent connections due to slow clients.

In that case, #1, #2 and #3 are two good solutions, and I would go with whichever is cheaper. Your concerns about #1 could be mitigated by generating download tokens that could only be used once, or for a small period of time.

Option #4, in my opinion, is not a great option. The speed can greatly vary during a download, so any estimate you would do initially would be, with a significant probability, wrong.

Artefacto 2010-08-31 15:24:33

setting the timeout to 0 is not desirable at all, "just in case".Do Amazon and Google have an API for setting/expiring these tokens? I'm familiar enough with the purpose of these services, but haven't looked into the implementation yet.

Chrisbloom7 2010-08-31 15:32:17

@Chris I think you'll have to write a small program that runs in EC2/AppEngine that does that job.

Artefacto 2010-08-31 15:54:12

set_time_limit(0) really is the best solution. "just in case" isnt an objection based on a concrete reason, and shouldnt be the reason to ignore the most obvious, simplest solution. If your download script is just loading bytes and sending them, there's no way it will get into an endless loop. If it stalls too long on the browser side, the client will disconnect, and the PHP script will be ended by the server. If you still dont want to do it, set it to something like 6 hours, this way you know for sure that the server will clean out any hanging php processes.

GrandmasterB 2010-08-31 17:46:48

I totally disagree with you on this. Why risk having the server hang from some silly error causing an endless loop. I pride myself on having plenty of safety checks in my code, but I would never leave a door like that open. The only time I ever set it zero is for scripts I'm running from the command line.

Chrisbloom7 2010-09-01 00:45:57

@Chris It's not particularly easy to create an endless loop in PHP... But even if there was one, so what? It wouldn't be the end of the word.

Artefacto 2010-09-01 00:54:07

Can't believe that's the general consensus on this point, but I guess it's just a pet peeve of mine then. Why take the risk. It's the weakest part of your code and it's easily avoidable. We have script timeouts for a reason. Anyway, that's not a viable solution to this problem, just a band aid and will not help the performance issues that we're already seeing.

Chrisbloom7 2010-09-01 01:04:10

I agree with Chris on this one. I'm equally surprised by `set_time_limit(0)` would be a generally accepted option. Here's a list of things that can go/may wrong just off the top of my head: 1) User disconnects, now your script may hang. 2) Attacker stalls the connection deliberately and deplete your server's resources. 3) The program gets into an endless loop <-- This can be extremely easy. Any non-trivial applications use a bunch of libraries, how do you know they don't get into endless loops? `set_time_limit(99999)` might have made some sense but `set_time_limit(0)` is really no-no to me.

kizzx2 2010-09-01 17:39:50

@kizzx2 1) if the user disconnects abnormally, PHP will still be able to detect it once it sends data, then stopping the script. 2) This is still possible even without PHP. A user throttling his connection to 0.5 KB/s can make a 2 MB image take over 1 hour to download. 3) If you're in an endless loop and sending data to the client, the client will eventually disconnect and kill the PHP script. If you're not sending data, well... it's just another bug; an easy one to detect and not particularly dangerous.

Artefacto 2010-09-01 21:58:59

@Artefacto Those are valid points. I tried hard to come up with more possible erorr scenarios because `set_time_out(0)` looks extremely ugly. Yet if PHP is smart enough most of the time, as you described, to detect all errorneous situations, then this default time-out thing seems unnecessary to me. Given that the hosting HTTP servers have their own timeouts, deducing from what you said, `set_time_out(0)` should probably become one of those mythical "best practices" and this time-out should probably be removed from the language.

kizzx2 2010-09-02 17:30:41

Answer 2

A:

I think the main problem is serving the file thourgh a PHP script. Not only you will have the timeout problem. Also there is a web server process running while the file is being sent to the client.

I would recommend some kind of #1. It don't has to be a CDN but the PHP script should redirect directly to the file. You might check the bypass using a rewrite rule and a param that will check if the param and the current request time match.

Kau-Boy 2010-08-31 15:25:12

Thanks for the suggestion. I'm leery of putting the files in a public facing folder as many of them are EXE files. I know I should be able lock them down with proper ACL/permissions but the permissions on this server occasionally get lost (or at least poorly copied) with the way they mirror their content. (Lots of little problems with this client tend to snowball into bigger problems and we can't fix them all at once.)

Chrisbloom7 2010-08-31 15:30:54

Than use a unix system that does not execute .exe files or pack them into archives.

Kau-Boy 2010-08-31 15:34:19

The merits of that solution hadn't escaped me, but I addressed the problems with switching to a *nix server in my original question.

Chrisbloom7 2010-08-31 15:41:43

Answer 3

A:

I think you might do something like #1 except keep it on your servers and bypass serving it via php directly. After whatever auth/approval needs to happen with php have that script create a temporary link to the file for dowwnload via traditional http. If on a *nix id do this via a symlink to the real file and have a cron job run every n minutes to clear old links to the file.

prodigitalson 2010-08-31 15:26:48

Answer 4

+1 A:

I am a bit reserved about #4. An attacker could forge a fake AJAX request to set your timeout to a very high value, then he can get you into an infinite loop. (If you were worried about that in the first place)

I would suggest a solution similar to @prodigitalson. You can make directories using hash values /downloads/389a002392ag02/myfile.zip which symlinks to the real file. Your PHP script redirects to that file which gets served by HTTP server. The symlink gets deleted periodically.

The added benefit for creating directory instead of a file is that end user doesn't see a mangled file name.

kizzx2 2010-08-31 15:31:08

Thanks for that suggestion. I do currently have a min/max limit on the timeout calculation, so that would take care of any spoofing in the AJAX scenario.

Chrisbloom7 2010-08-31 15:34:03

I do like the idea of using temp folders/files. The only drawback that I can see with that solution is that it relies on the load balancer having those sticky IPs. If we disable that later (which I hope to do at some point - there is a technical reason for having them in place now which requires some work to get rid of) then that might not work as the initial request could go to one server and the refresh go to another. I suppose I could always use the IP address of the originating server in the redirect though...

Chrisbloom7 2010-08-31 15:38:33

Answer 5

A:

You may create a temp file on the disk, or a symlink, and then redirect(using header()) to that temp file. Then a cronjob could come and remove "expired" temp files. The key here is that every download should have a unique temp file associated.

Quamis 2010-08-31 15:33:44

Answer 6

+1 A:

Use X-SENDFILE. Most webservers will support it either natively, or though a plugin (apache).

using this header you can simply specify a local file path and exit the PHP script. The webserver sees the header and serves that file instead.

Evert 2010-08-31 15:35:38

I assume the file has to be inside the public facing portion of the website, correct? Or does apache figure it out and serve it somehow even if it's outside the web root? I'm purposely keeping these files outside of the web root for security sake.

Chrisbloom7 2010-08-31 15:40:11

So looking at this option, it sounds like it will do the job. I just need to make sure it works under Windows. I will try it out and report back. Thanks for the tip.

Chrisbloom7 2010-08-31 16:22:15

Grrr, except I don't seem to be able to compile this under Snow Leopard properly. apxs compiles it properly, but Apache refuses to start. Says either "API module structure 'xsendfile_module' in file /Applications/MAMP/Library/modules/mod_xsendfile.so is garbled - expected signature 41503230 but saw 41503232 - perhaps this is not an Apache module DSO, or was compiled for a different Apache version?" or "Cannot load /Applications/MAMP/Library/modules/mod_xsendfile.so into server: cannot create object file image or add library" depending on what arch flags I set for apxs.

Chrisbloom7 2010-08-31 18:30:43

You're better off using macports :).

Evert 2010-08-31 20:50:44

Using macports for what? If you mean to run a MAMP setup, I wrestled with MacPorts for months to get it to work properly. The first time I had to completely wipe my MacBook and start over. The second time it totally fubared some dynamic library which led to the entire /var folder being corrupted. I very nearly lost the whole thing but was able to boot from the OSX DVD and repair the damage. After that, I tried MAMP Pro and fell in love with it. Couldn't be easier to get a nice MAMP environment running. I use MacPorts for managing some things like git and ruby, but I'm really happy with MAMP.

Chrisbloom7 2010-09-01 00:44:04

Besides, MacPorts won't help me in this case if I can't compile the mod_xsendfile module anyway

Chrisbloom7 2010-09-01 00:47:15

You can also use `virtual` if you're using an Apache module.

Artefacto 2010-09-01 00:55:01

There's a good chance macports has mod_xsendfile, but don't take it out on me if it didn't work. Just trying to help here :)

Evert 2010-09-01 08:13:39

Not taking it out on you, @Evert. Just saying MacPorts wont help me in this case (it's not there, I searched for it already). I have since found out that it actually compiles OK for the native Apache instance in Snow Leopard, but MAMP Pro's version of Apache doesn't load it. So I've taken it up with them.

Chrisbloom7 2010-09-02 12:55:03

Despite not being able to get X_Sendfile setup in my local MAMP environment, the Windows binary version "just worked" in Windows, and it definitely does the job. I added a small test to make sure that the mod_xsendfile module is loaded in Apache (just in case I can't get it working locally). If it is, I send the file to Apache to finish downloading and don't have to bother upping the script timeout. Or if it's not available I just fall back to the old code that forced the download, except I dropped the connection speed to 50Kb/sec in my calculation for the script timeout limit.

Chrisbloom7 2010-09-02 23:26:37

Oh, this solution has the added benefit of meeting all of the requirements of the spec: files outside the web root, validate and log each download attempt. Plus it required very little changes to the code. Surprised that I've never heard of this module before!

Chrisbloom7 2010-09-02 23:28:14

Ya it rocks =) Glad I could help

Evert 2010-09-03 15:01:55

ansaurus

tags:

views:

answers:

What is the best way to handle this: large download via PHP + slow connection from client = script timeout before file is completely downloaded

related questions