ansaurus

Question

Could Python's logging SMTP Handler be freezing my thread for 2 minutes?

Answer 1

+1 A:

A two minute pause sounds like a timeout - mostly probably in the networking stack.

Try adding:

*                -       nofile          64000

to your /etc/security/limits.conf file on all of the machines involved and then rebooting all of the machines to ensure it is applied to all running services.

Benjamin Franz 2010-04-27 14:44:18

To make sure I understand this: You are suggesting that something (sendmail and/or Python's logging module) is being blocked from opening a file per email by some default security limit, and adding this (presumably larger) limit will increase that limit.

Oddthinking 2010-04-27 14:53:37

Right. If you are running into the limit on open filehandles (including sockets in 'TIME_WAIT' state) the OS will prevent you from opening any more until the number falls below the limit. In the case of TCP TIME_WAIT sockets that happens when the normal timeout (which takes a couple of minutes) kicks in.

Benjamin Franz 2010-04-27 14:59:20

Thanks. I will rig up a stress-test on SMTP, and then test this out during a quiet period for the server - probably in the next few days. (In the meantime, I stopped sending emails, and only log the errors to a file.)

Oddthinking 2010-04-28 00:13:17

Answer 2

+1 A:

Stress-testing was revealing:

My logging configuration sent critical messages to SMTPHandler, and debug messages to a local log file.

For testing I created a moderately large number of threads (e.g. 50) that waited for a trigger, and then simultaneosly tried to log either a critical message or a debug message, depending on the test.

Test #1: All threads send critical messages: It revealed that the first critical message took about .9 seconds to send. The second critical message took around 1.9 seconds to send. The third longer still, quickly adding up. It seems that the messages that go to email block waiting for each other to complete the send.

Test #2: All threads send debug messages: These ran fairly quickly, from hundreds to thousands of microseconds.

Test #3: A mix of both. It was clear from the results that debug messages were also being blocked waiting for critical message's emails to go out.

So, it wasn't that 2 minutes meant there was a timeout. It was the two minutes represented a large number of threads blocked waiting in the queue.

Why were there so many critical messages being sent at once? That's the irony. There was a logging.debug() call inside a method that included a network call. I had some code monitoring the speed of the of the method (to see if the network call was taking too long). If so, it (of course) logged a critical error that sent an email. The next thread then blocked on the logging.debug() call, meaning it missed the deadline, triggering another email, triggering another thread to run slowly.

The 2 minute delay in one thread wasn't a network timeout. It was one thread waiting for another thread, that was blocked for 1 minute 57 - because it was waiting for another thread blocked for 1 minute 55, etc. etc. etc.

This isn't very pretty behaviour from SMTPHandler.

Oddthinking 2010-04-29 03:44:10

ansaurus

tags:

views:

answers:

Could Python's logging SMTP Handler be freezing my thread for 2 minutes?

related questions