I have a .NET 3.5 C# application that sends 2000-6000 byte packets to a linux machine running sles 10. The machines are on the same subnet.
About 90% of the time, everything works fine. The linux machine processes my request and responds in 5-15ms. But about 10% of the time, there is an approx 200ms-800ms delay.
Looking at the logs on the linux machine, it seems the delay is on my end. That is, if my call to socket.Send(...) returns at 1:15:00.000 and I get a response at 1:15:00.210, the log on the linux machine says that it received the request at 1:15:00.200 and then processed it in 10ms. (I'm using System.Diagnostics.Stopwatch for timing on my machine.)
To debug, I captured the traffic using wireshark. Here is the traffic. Between No. 8 and 9 is where a 600 ms delay occurs. (137.34.210.108 is my machine and 137.34.210.95 is the linux machine).
"1","11:56:27.380318","137.34.210.95","137.34.210.108","TCP","20700 > 17479 [PSH, ACK] Seq=1 Ack=1 Win=32767 Len=76"
"2","11:56:27.380393","HewlettP_29:37:0f","Broadcast","ARP","Who has 137.34.210.95? Tell 137.34.210.108"
"3","11:56:27.380558","HewlettP_29:39:93","HewlettP_29:37:0f","ARP","137.34.210.95 is at 00:1b:78:29:39:93"
"4","11:56:27.380564","137.34.210.108","137.34.210.95","TCP","17479 > 20700 [ACK] Seq=1 Ack=77 Win=65459 [TCP CHECKSUM INCORRECT] Len=0"
"5","12:04:48.096892","HewlettP_29:37:0f","Broadcast","ARP","Who has 137.34.210.95? Tell 137.34.210.108"
"6","12:04:48.097216","HewlettP_29:39:93","HewlettP_29:37:0f","ARP","137.34.210.95 is at 00:1b:78:29:39:93"
"7","12:04:48.097229","137.34.210.108","137.34.210.95","TCP","17480 > 20600 [PSH, ACK] Seq=1 Ack=1 Win=64198 [TCP CHECKSUM INCORRECT] Len=458"
"8","12:04:48.097457","137.34.210.95","137.34.210.108","TCP","20600 > 17480 [ACK] Seq=1 Ack=4294964377 Win=32767 Len=0 SLE=1 SRE=459"
"9","12:04:49.700966","137.34.210.108","137.34.210.95","TCP","17479 > 20700 [ACK] Seq=1 Ack=77 Win=65459 [TCP CHECKSUM INCORRECT] Len=1460"
"10","12:04:49.701190","137.34.210.108","137.34.210.95","TCP","[TCP Retransmission] 17480 > 20600 [ACK] Seq=4294964377 Ack=1 Win=64198 [TCP CHECKSUM INCORRECT] Len=1460"
"11","12:04:49.703970","137.34.210.95","137.34.210.108","TCP","20600 > 17480 [ACK] Seq=1 Ack=4294965837 Win=32767 Len=0 SLE=1 SRE=459"
"12","12:04:49.703993","137.34.210.108","137.34.210.95","TCP","[TCP Retransmission] 17480 > 20600 [ACK] Seq=4294965837 Ack=1 Win=64198 [TCP CHECKSUM INCORRECT] Len=1460"
"13","12:04:49.704002","137.34.210.108","137.34.210.95","TCP","[TCP Retransmission] 17480 > 20600 [PSH, ACK] Seq=1 Ack=1 Win=64198 [TCP CHECKSUM INCORRECT] Len=458"
"14","12:04:49.704211","137.34.210.95","137.34.210.108","TCP","20600 > 17480 [ACK] Seq=1 Ack=459 Win=32767 Len=0"
"15","12:04:49.704215","137.34.210.95","137.34.210.108","TCP","[TCP Dup ACK 14#1] 20600 > 17480 [ACK] Seq=1 Ack=459 Win=32767 Len=0 SLE=1 SRE=459"
"16","12:04:49.705425","137.34.210.95","137.34.210.108","TCP","20700 > 17479 [PSH, ACK] Seq=77 Ack=1461 Win=32767 Len=44"
Can someone help me to interpret this? I see that a re-transmit is occurring. But I'm not sure why. The switch shows no dropped packets. And even if the packets are being lost, why would it take 600ms to re-transmit?
I thought that this (http://support.microsoft.com/kb/328890) might have something to do with the 200ms delays but I've tried changing the TcpAckFrequency and it didn't help.
Thanks, Mike