We are moving large amounts of data on a LAN and it has to happen very rapidly and reliably. Currently we use windows TCP as implemented in C++. Using large (synchronous) sends moves the data much faster than a bunch of smaller (synchronous) sends but will frequently deadlock for large gaps of time (.15 seconds) causing the overall transfer rate to plummet. This deadlock happens in very particular circumstances which makes me believe it should be preventable altogether. More importantly if we don't really know the cause we don't really know it won't happen some time with smaller sends anyway. Can anyone explain this deadlock?
Deadlock description (OK, zombie-locked, it isn't dead, but for .15 or so seconds it stops, then starts again)
- The receiving side sends an ACK.
- The sending side sends a packet containing the end of a message (push flag is set)
- The call to socket.recv takes about .15 seconds(!) to return
- About the time the call returns an ACK is sent by the receiving side
- The the next packet from the sender is finally sent (why is it waiting? the tcp window is plenty big)
The odd thing about (3) is that typically that call doesn't take much time at all and receives exactly the same amount of data. On a 2Ghz machine that's 300 million instructions worth of time. I am assuming the call doesn't (heaven forbid) wait for the received data to be acked before it returns, so the ack must be waiting for the call to return, or both must be delayed by something else.
The problem NEVER happens when there is a second packet of data (part of the same message) arriving between 1 and 2. That part very clearly makes it sound like it has to do with the fact that windows TCP will not send back a no-data ACK until either a second packet arrives or a 200ms timer expires. However the delay is less than 200 ms (its more like 150 ms).
The third unseemly character (and to my mind the real culprit) is (5). Send is definitely being called well before that .15 seconds is up, but the data NEVER hits the wire before that ack returns. That is the most bizarre part of this deadlock to me. Its not a tcp blockage because the TCP window is plenty big since we set SO_RCVBUF to something like 500*1460 (which is still under a meg). The data is coming in very fast (basically there is a loop spinning out data via send) so the buffer should fill almost immediately. Msdn mentions that there various "heuristics" used in deciding when a send hits the wire, and that an already pending send + a full buffer will cause send to block until the data hits the wire (otherwise send apparently really just copies data into the tcp send buffer and returns).
Anway, why the sender doesn't actually send more data during that .15 second pause is the most bizarre part to me. The information above was captured on the receiving side via wireshark (except of course the socket.recv return times which were logged in a text file). We tried changing the send buffer to zero and turning off nagel on the sender (yes, I know nagel is about not sending small packets - but we tried turning nagel off in case that was part of the unstated "heuristics" affecting whether the message would be posted to the wire. Technically microsoft's nagel is that a small packet isn't sent if the buffer is full and there is an outstanding ACK, so it seemed like a possibility).