views:

104

answers:

2

I'm trying to get my Django based webapp into a working deployment configuration, and after spending a bunch of time trying to get it working under lighttpd / fastcgi, can't get past this problem. When a client logs in for the first time, they receive a large data dump from the server, which is broken into several ~1MB size chunks that are sent back as JSON.

Every so often, the client will receive a truncated response for one of the chunks, I will see this message in the lighttpd logs:

2010-09-14 23:25:01: (mod_fastcgi.c.2582) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:127.0.0.1:8000 
2010-09-14 23:25:01: (mod_fastcgi.c.3382) response already sent out, but backend returned error on socket: tcp:127.0.0.1:8000 for /myapp.fcgi?, terminating connection 

I'm really pulling my hair out trying to figure out why this happens (which doesn't happen when running Django in ./manage.py runserver mode). The following are things I've tried that have had no effect:

  • Reducing the chunk size from 1MB to 256K. Even though the truncation usually happens at around the 600K - 900K mark, I still got truncations under the 256K chunk size.

  • Setting the minspare and maxchildren values on Django's runfgci really high so that there will be lots of spare threads hanging around.

  • Setting maxchildren to 1 so that there is only one thread.

  • Switching between UNIX socket mode and TCP/IP mode for the fastcgi connection between lighttpd and Django.

I've Googled a lot for this stuff but couldn't find anything that seemed to be a fix for Django (any help seemed to be around tweaking PHP settings).

My setup is:

  • OSX 10.6.4

  • Python 2.6.1 (system)

  • lighttpd installed from Macports (1.4.26_1+ssl)

  • flup installed from latest Python egg on flup website (tried both 1.0.2 stable and latest 1.0.3 devel)

  • Django 1.2.1 installed from tarball on Django website

The FastCGI block in my lighttpd config is:

fastcgi.server             = ("/myapp.fcgi" =>
                               ("django" =>
                                 (
                                  #"socket" => lighttpd_base + "fcgi.sock",
                                  "host" => "127.0.0.1",
                                  "port" => 8000,
                                  "check-local" => "disable",
                                  "max-procs" => 1,
                                  "debug" => 1
                                 )
                               )
                             )

The runfcgi command I'm using to start Django is currently:

./manage.py runfcgi daemonize=false debug=true host=127.0.0.1 port=8000 
method=threaded maxchildren=1

If anyone has any insight into how to stop this from happening, the help would be much appreciated. If I can't solve this relatively quickly I will have to abandon lighttpd + fastcgi and look at Apache + mod_wsgi or perhaps nginx + fastcgi, and the prospect of going into another webserver config is not something I'm looking forward to ...

Thanks in advance for any help.

Edit: Additional Info

I found this page on the lighty forums indicating that it could be Django's fault ... in that case it was that PHP was crashing. I checked my Django-side stuff and discovered that even after a truncation, the Python thread that sent the truncated response would still be running afterwards and would serve subsequent requests, so it looks like the stream is not being broken by the thread hitting an exception and crashing out.

I wanted to figure out whether or not it was Django's fcgi impl or Lighttpd that was at fault here (because that will determine whether or not moving to nginx + fastcgi would actually solve anything), so I took a look at the packet trace in Wireshark. The simplified log of what happens just before a truncation is below:

No.     Time        Info
30082   233.411743  django > lighttpd [PSH, ACK] Seq=860241 Ack=869 Win=524280 Len=8184 TSV=417114153 TSER=417114153
30083   233.411749  lighttpd > django [ACK] Seq=869 Ack=868425 Win=524280 Len=0 TSV=417114153 TSER=417114153
30084   233.412235  django > lighttpd [PSH, ACK] Seq=868425 Ack=869 Win=524280 Len=8 TSV=417114153 TSER=417114153
30085   233.412250  lighttpd > django [ACK] Seq=869 Ack=868433 Win=524280 Len=0 TSV=417114153 TSER=417114153
30086   233.412615  django > lighttpd [PSH, ACK] Seq=868433 Ack=869 Win=524280 Len=8184 TSV=417114153 TSER=417114153
30087   233.412628  lighttpd > django [ACK] Seq=869 Ack=876617 Win=524280 Len=0 TSV=417114153 TSER=417114153
30088   233.412723  lighttpd > django [FIN, ACK] Seq=869 Ack=876617 Win=524280 Len=0 TSV=417114153 TSER=417114153
30089   233.412734  django > lighttpd [ACK] Seq=876617 Ack=870 Win=524280 Len=0 TSV=417114153 TSER=417114153
30090   233.412740  [TCP Dup ACK 30088#1] lighttpd > django [ACK] Seq=870 Ack=876617 Win=524280 Len=0 TSV=417114153 TSER=417114153
30091   233.413051  django > lighttpd [PSH, ACK] Seq=876617 Ack=870 Win=524280 Len=8 TSV=417114153 TSER=417114153
30092   233.413070  lighttpd > django [RST] Seq=870 Win=0 Len=0

Good packets are coming from Django at the start (30082 for 8184 bytes, and then again at 30086 for another 8184 bytes) and then at entry 30088 for some reason Lighttpd sends a TCP FIN to Django which is presumably what causes the connection to terminate and that's how you get the truncation.

On the face of it, it seems like this is Lighttpd's fault, since it looks like it is shutting things down before it's supposed to ... although I'm not sure that it isn't doing this because it has received some bad data from Django to which it reacts by shutting down.

A: 

Are you sure you don't have something else already listening on port 8000. That port is usually used for HTTP servers by convention and would be bad idea to run FASTCGI process on it. Suggest you use a different port no where near 8xxx range.

Graham Dumpleton
Hi Graham. Yes I'm sure port 8000 is not the problem, because (1) if something was already bound to port 8000 then Django would bail out at startup with a "can't bind to address" error (2) if something else other than my Django server was sending data back to lighttpd on this port, I would get data-format errors on the client on every response, rather than intermittently like I'm seeing (3) the same problem happens when using UNIX sockets rather than TCP/IP on port 8000 (4) I tried it on port 34567 just for completeness and got the truncation on my first try.
glenc
A: 

For what it's worth, I did eventually end up jumping ship to nginx and everything seems to work fine, so my suspicion that it was lighttpd's fault rather than a buggy fcgi impl in Django seems to have been well-founded.

I actually found the setup of nginx much easier than lighttpd, not to mention that you can install a macport of nginx (port install nginx +ssl) that does not contain the SSL-breaking bug that lighttpd suffers from here.

glenc