I've got a strange issue with a server accepting TCP connections. Even though there are normally some processes waiting, at some volume of connections it hangs.
Long version:
The server is written in Perl and binds a $srv
socket with the reuse flag and listen == 5. Afterwards, it forks into 10 processes with a loop of $clt=$srv->accept(); do_processing($clt); $clt->shutdown(2);
The client written in C is also very simple - it sends some lines, then receives all lines available and does a shutdown(sockfd, 2);
There's nothing async going on and at the end both send and receive queues are empty (as reported by netstat
).
Connections last only ~20ms. All clients behave the same way, are the same implementation, etc. Now let's say I'm accepting X
connections from client 1 and another X
from client 2. Processes still report that they're idle all the time. If I add another X
connections from client 3, suddenly the server processes start hanging just after accepting. The first blocking thing they do after accept();
is while (<$clt>) ...
- but they don't get any data (on the first try already). Suddenly all 10 processes are in this state and do not stop waiting. On strace
, the server processes seem to hang on read()
, which makes sense.
There are loads of connections in TIME_WAIT
state belonging to that server (~100 when the problem starts to manifest), but this might be a red herring.
What could be happening here?
After some more analysis: It turned out that the client was at fault, not closing previous connections properly before trying the next one. The servers at the beginning of the load-balancing list were left stale connections.