This is related to this http://stackoverflow.com/questions/926579/configure-apache-to-recover-from-modpython-errors, although I've since stopped assuming that this has anything to do with mod_python. Essentially, I have a problem that I wasn't able to reproduce consistently and I wanted some feedback on whether the proposed solution seems likely and some potential ways to try and reproduce this problem.
The setup: a django-powered site would begin throwing errors after a few days of use. They were always ImportError
s or ImproperlyConfigured
errors, which amount to the same thing, since the message always specified trouble loading some module referenced in the settings.py file. It was not generally the same class. I am using preforked apache with 8 forked children, and whenever this problem would come up, one process would be broken and seven would be fine. Once broken, every request (with Debug On in the apache conf) would display the same trace every time it served a request, even if the failed load is not relevant to the particular request. An httpd restart
always made the problem go away in the short run.
Noted problems: installation and updates are performed via svn with some post-update scripts. A few .pyc
files accidentally were checked into the repository. Additionally, the project itself was owned by one user (not apache, although apache had permissions on the project) and there was a persistent plugin that ended up getting backgrounded as root. I call these noted problems because they would be wrong whether or not I noticed this error, and hence I have fixed them. The project is owned by apache and the plugin is backgrounded as apache. All .pyc
files are out of the repository, and they are all force-recompiled after each checkout while the server and plugin have been stopped.
What I want to know is
- Do these configuration disasters seem like a likely explanation for sporadic
ImportError
s? - If there is still a problem somewhere else in my code, how would I best reproduce it?
As for 2, my approach thus far has been to write some stress tests that repeatedly request the same page so as to execute common code paths.
Incidentally, this has been running without incident for about 2 days since the fix, but the problem was observed with 1 to 10 day intervals between.