Hello there,
Recently I installed trac in one of my dreamhost domains. I followed the instructions of http://trac.mlalonde.net/wiki/CreamyTrac and everything worked perfectly. At least i thought that was the case.
After a few days, i started to notice that i was getting random 500 pages. I quickly checked the error log, and found a bunch of:
[Fri Apr 16 14:35:34 2010] [error] [client *.*.*.*] Premature end of script headers: dispatch.fcgi
[Fri Apr 16 14:35:54 2010] [error] [client *.*.*.*] Premature end of script headers: dispatch.fcgi, referer: http://www.trac.****.com/login
[Fri Apr 16 16:05:58 2010] [error] [client *.*.*.*] Premature end of script headers: dispatch.fcgi, referer: http://www.trac.****.com/timeline
The trac instance has very low traffic so it is possible that this might have been triggered faster if there was more use of it.
So next step i went to look at top and am amazed by what i see: (this is NOT exactly what i saw, i just reproduced this from memory)
23730 rl_inst 20 0 44516 15m 4012 S 0.0 0.4 0:00.17 python2.4
23731 rl_inst 20 0 44616 15m 4012 S 0.0 0.4 0:03.17 python2.4
23732 rl_inst 20 0 44116 15m 4012 S 0.0 0.4 0:01.17 python2.4
23733 rl_inst 20 0 44826 15m 4012 S 0.0 0.4 0:04.17 python2.4
23734 rl_inst 20 0 44216 15m 4012 S 0.0 0.4 0:02.17 python2.4
23735 rl_inst 20 0 44416 15m 4012 S 0.0 0.4 0:01.17 python2.4
I opened the trac site again and that changed to this:
23730 rl_inst 20 0 44516 15m 4012 S 0.0 0.4 0:00.17 python2.4
23731 rl_inst 20 0 44616 15m 4012 S 0.0 0.4 0:03.17 python2.4
23732 rl_inst 20 0 44116 15m 4012 S 0.0 0.4 0:01.17 python2.4
23733 rl_inst 20 0 44826 15m 4012 S 0.0 0.4 0:04.17 python2.4
23734 rl_inst 20 0 44216 15m 4012 S 0.0 0.4 0:02.17 python2.4
23735 rl_inst 20 0 44416 15m 4012 S 0.0 0.4 0:01.17 python2.4
28378 rl_inst 20 0 2608 1208 1008 S 0.0 0.0 0:00.00 dispatch.fcgi
28382 rl_inst 20 0 44248 15m 4012 S 0.0 0.4 0:02.19 python2.4
So the dispatch.fcgi was being called correctly and was spawning correctly other python2.4 process. After the IDLE time passed this is the results:
23730 rl_inst 20 0 44516 15m 4012 S 0.0 0.4 0:00.17 python2.4
23731 rl_inst 20 0 44616 15m 4012 S 0.0 0.4 0:03.17 python2.4
23732 rl_inst 20 0 44116 15m 4012 S 0.0 0.4 0:01.17 python2.4
23733 rl_inst 20 0 44826 15m 4012 S 0.0 0.4 0:04.17 python2.4
23734 rl_inst 20 0 44216 15m 4012 S 0.0 0.4 0:02.17 python2.4
23735 rl_inst 20 0 44416 15m 4012 S 0.0 0.4 0:01.17 python2.4
28382 rl_inst 20 0 44248 15m 4012 S 0.0 0.4 0:02.19 python2.4
dispatch.fcgi was gone but the corresponding python2.4 was still there o_O. I had started with 5 sleeping processes of python and after idle time i ended up with 6. I repeated this and found that it kept spawning more and more python2.4, this was indeed what was causing my 500s indirectly. Let me explain my findings.
Am on a shared hosting so my processes get killed if they are way too many. So everytime i opened trac 2 new processes spawned and one remained, to the point that dreamhost was killing a random process when i reached my limit. Sometimes killing the python2.4 that actually was rendering the current page. Hence header premature ending, python is gone the .fcgi doesnt know what to do and throws an 500.
I found a dirty solution to this. I changed my dispatch.fcgi to contain a line that killed any currently running python2.4 process and then spawn a new one. Since then i dont get any rogues what so ever. But i dont think this is the best solution, calling killall in every fcgi just seems wrong.
Anyone has run into this issue and found a cleaner solution? Is there anything that i have overlooked?