I'm just starting to work on a tornado application that is having some CPU issues. The CPU time will monotonically grow as time goes by, maxing out the CPU at 100%. The system is currently designed to not block the main thread. If it needs to do something that blocks and asynchronous drivers aren't available, it will spawn another thread to do the blocking operation.
Thus we have the main thread being almost totally CPU-bound and a bunch of other threads that are almost totally IO-bound. From what I've read, this seems to be the perfect way to run into problems with the GIL. Plus, my profiling shows that we're spending a lot of time waiting on signals (which I'm assuming is what __semwait_signal
is doing), which is consistent with the effects the GIL would have in my limited understanding.
If I use sys.setcheckinterval
to set the check interval to 300, the CPU growth slows down significantly. What I'm trying to determine is whether I should increase the check interval, leave it at 300, or be scared with upping it. After all, I notice that CPU performance gets better, but I'm a bit concerned that this will negatively impact the system's responsiveness.
Of course, the correct answer is probably that we need to rethink our architecture to take the GIL into account. But that isn't something that can be done immediately. So how do I determine the appropriate course of action to take in the short-term?