views:

543

answers:

2

We're running a Debian with a 2.6.16 kernel, with iptables enabled. The system is running a custom made HTTP proxy, which is subjected to a mild load (it works fine with the same load on other sites). The system comprises of 4 servers that are preceded by a load balancer with virtual IP, which is preceded by an array of 4 ISA 2004 machines, so the basic topology is:

Client -> ISA [1-4] -> Load Balancer -> Our Proxy [1-4] -> The Internet

Occasionally, the ISA will send us a SYN packet, to which no SYN-ACK is being sent. It will try again after 3 seconds, and a third time after another 6 seconds, after which it will report the proxy down, and switch to direct connection. During this time, meaning before, in between and after those 3 SYNs, other SYNs from the same ISA come and are successfully answered to.

A very similar problem is being reported by others (with no solution, however):

All coming from a flavor of Linux called CentOS. It’s peculiarity is in having iptables enabled by default.

http://www.linuxhelpforum.com/showthread.php?t=931912&mode=linear http://www.centos.org/modules/newbb/viewtopic.php?topic_id=16147

Almost the same: but a bit different: http://www.linuxquestions.org/questions/linux-networking-3/tcp-handshake-fails-synack-ignored-by-system.-637171/

Also seems to be relevant: http://groups.google.com/group/comp.os.linux.networking/browse_thread/thread/b1c000e2d65e0034

I suspect iptables to be a culprit, but any additional feedback will be welcome.

+2  A: 

Look at the second parameter to the listen call, as mentioned in the first link you posted. It's the maximum number of pending (not accepted yet) connections. According to the listen(2) man page, if the protocol supports retransmission (TCP does), the connection request will be dropped when the queue is full (expecting a later retransmission which will create the connection if there is enough space in the queue again).

CesarB
This does not explain that other requests to the same port succeed, while these fail.
The other requests probably get lucky and come after the queue empties a bit (i.e. after the server accepted some of the pending connections).
CesarB
Then the next SYN would have got in, or the one after it - but if one is left unanswered, all three are unanswered. I find it hard to explain it just by luck.
"...hard to explain by dumb luck". So, increase the depth of the listen queue and you'll find out.
Tall Jeff
A: 

Indeed, the iptables turned out to be the culrpit, with the rule that dropped INVALID packets. We still do not know for sure what made iptables to think those SYNs were invalid (no TIME_WAIT for sure, since we did not have any traffic with the same source ports for at least 30 mins prior to the drops).