views:

1562

answers:

4

I have a .NET 3.5 C# application that sends 2000-6000 byte packets to a linux machine running sles 10. The machines are on the same subnet.

About 90% of the time, everything works fine. The linux machine processes my request and responds in 5-15ms. But about 10% of the time, there is an approx 200ms-800ms delay.

Looking at the logs on the linux machine, it seems the delay is on my end. That is, if my call to socket.Send(...) returns at 1:15:00.000 and I get a response at 1:15:00.210, the log on the linux machine says that it received the request at 1:15:00.200 and then processed it in 10ms. (I'm using System.Diagnostics.Stopwatch for timing on my machine.)

To debug, I captured the traffic using wireshark. Here is the traffic. Between No. 8 and 9 is where a 600 ms delay occurs. (137.34.210.108 is my machine and 137.34.210.95 is the linux machine).

"1","11:56:27.380318","137.34.210.95","137.34.210.108","TCP","20700 > 17479 [PSH, ACK] Seq=1 Ack=1 Win=32767 Len=76"
"2","11:56:27.380393","HewlettP_29:37:0f","Broadcast","ARP","Who has 137.34.210.95?  Tell 137.34.210.108"
"3","11:56:27.380558","HewlettP_29:39:93","HewlettP_29:37:0f","ARP","137.34.210.95 is at 00:1b:78:29:39:93"
"4","11:56:27.380564","137.34.210.108","137.34.210.95","TCP","17479 > 20700 [ACK] Seq=1 Ack=77 Win=65459 [TCP CHECKSUM INCORRECT] Len=0"
"5","12:04:48.096892","HewlettP_29:37:0f","Broadcast","ARP","Who has 137.34.210.95?  Tell 137.34.210.108"
"6","12:04:48.097216","HewlettP_29:39:93","HewlettP_29:37:0f","ARP","137.34.210.95 is at 00:1b:78:29:39:93"
"7","12:04:48.097229","137.34.210.108","137.34.210.95","TCP","17480 > 20600 [PSH, ACK] Seq=1 Ack=1 Win=64198 [TCP CHECKSUM INCORRECT] Len=458"
"8","12:04:48.097457","137.34.210.95","137.34.210.108","TCP","20600 > 17480 [ACK] Seq=1 Ack=4294964377 Win=32767 Len=0 SLE=1 SRE=459"
"9","12:04:49.700966","137.34.210.108","137.34.210.95","TCP","17479 > 20700 [ACK] Seq=1 Ack=77 Win=65459 [TCP CHECKSUM INCORRECT] Len=1460"
"10","12:04:49.701190","137.34.210.108","137.34.210.95","TCP","[TCP Retransmission] 17480 > 20600 [ACK] Seq=4294964377 Ack=1 Win=64198 [TCP CHECKSUM INCORRECT] Len=1460"
"11","12:04:49.703970","137.34.210.95","137.34.210.108","TCP","20600 > 17480 [ACK] Seq=1 Ack=4294965837 Win=32767 Len=0 SLE=1 SRE=459"
"12","12:04:49.703993","137.34.210.108","137.34.210.95","TCP","[TCP Retransmission] 17480 > 20600 [ACK] Seq=4294965837 Ack=1 Win=64198 [TCP CHECKSUM INCORRECT] Len=1460"
"13","12:04:49.704002","137.34.210.108","137.34.210.95","TCP","[TCP Retransmission] 17480 > 20600 [PSH, ACK] Seq=1 Ack=1 Win=64198 [TCP CHECKSUM INCORRECT] Len=458"
"14","12:04:49.704211","137.34.210.95","137.34.210.108","TCP","20600 > 17480 [ACK] Seq=1 Ack=459 Win=32767 Len=0"
"15","12:04:49.704215","137.34.210.95","137.34.210.108","TCP","[TCP Dup ACK 14#1] 20600 > 17480 [ACK] Seq=1 Ack=459 Win=32767 Len=0 SLE=1 SRE=459"
"16","12:04:49.705425","137.34.210.95","137.34.210.108","TCP","20700 > 17479 [PSH, ACK] Seq=77 Ack=1461 Win=32767 Len=44"

Can someone help me to interpret this? I see that a re-transmit is occurring. But I'm not sure why. The switch shows no dropped packets. And even if the packets are being lost, why would it take 600ms to re-transmit?

I thought that this (http://support.microsoft.com/kb/328890) might have something to do with the 200ms delays but I've tried changing the TcpAckFrequency and it didn't help.

Thanks, Mike

A: 

This only shows the TCP packets on the Linux machine, but I'd recommend to look at the ip stats with the 'netstat -s' command. One reason for the retransmissions might be socket buffer overflows, which will be shown with this command.

steve
Here is the output...TCP Statistics for IPv4 Active Opens = 14801 Passive Opens = 1168 Failed Connection Attempts = 116 Reset Connections = 33 Current Connections = 46 Segments Received = 64272975 Segments Sent = 20797063 Segments Retransmitted = 486UDP Statistics for IPv4 Datagrams Received = 25465 No Ports = 52039 Receive Errors = 7098 Datagrams Sent = 26223
Michael Covelli
IPv4 Statistics Packets Received = 64340244 Received Header Errors = 0 Received Address Errors = 3230 Datagrams Forwarded = 0 Unknown Protocols Received = 0 Received Packets Discarded = 38 Received Packets Delivered = 64329718 Output Requests = 20827053
Michael Covelli
Routing Discards = 0 Discarded Output Packets = 0 Output Packet No Route = 0 Reassembly Required = 15207 Reassembly Successful = 5051 Reassembly Failures = 2587 Datagrams Successfully Fragmented = 0 Datagrams Failing Fragmentation = 0 Fragments Created = 0
Michael Covelli
Thanks for taking a look, sorry for the poor formatting on the netstat -s output. As for the log from wireshark, I believe this shows both directions. The IP on the left is the source and the ip on the right is the destination.
Michael Covelli
Michael, these stats are cumulative , meaning when the machine was started. You need to measure the diff, so we get more accurate data. From the stats it is clear that there are retransmits.Did you increase the socket receive buffer size? Is the nagle algorithm enabled?
steve
Nagle is disabled for this connection (TcpNoDelay option). I will try to clear this and just capture one of these in the netstat output.I can try increasing the buffer size. Is changing it with the GetSocketOption/SetSocketOption generally preferred? Or changing it in the registry? What sizes are typical?
Michael Covelli
Where would a socket buffer overflow show up in the netstat statistics?
Michael Covelli
A: 

Let's start by pruning some of that Wireshark output. We can toss the ARPs in packets 2, 3, 5 and 6. Looking at the rest, you have two sets of traffic in there. Packets 8 and 9 are two different connections, so you can't compare them. 7, 8 and 10, however, are part of one connection so let's examine those.

Packet 7 is 458 bytes of data being sent to the Linux box with a TCP sequence number of 1. However, the ACK that the Linux box returns is 4294964377. This means that Wireshark is showing relative TCP values and that the Linux box is not sending an ACK for packet 7, but for an earlier packet. Your PC then waits for a follow-up ACK and, when it doesn't get one, retransmits the required data. In this case the 458 bytes from packet 7 along with a previous 1002 bytes. That's why the sequence number from packet 10 matches the ACK from packet 8.

Unfortunately this doesn't tell you why data is being dropped. Packet 8 shows the Linux box indicating it still has a full 32k of input buffer available for this connection ("Win=32767").

Chris Ryding
Thanks so much. My mistake was forgetting that I was sending two packets at the same time to that machine, one on port 20600 and one on 20700. So I had confused the two conversations. Now I guess my question is: how can I make the re-transmission faster? Why the long pause between the duplicate ack for an earlier packet and the re-transmission from my end? How can I make this faster?
Michael Covelli
Can you run a capture on both your PC and the Linux box at the same time? This looks like a classic case of a packet not reaching its destination. I assume this connection is over a LAN with some sort of switch. If that's the case, maybe you have some other tools to monitor overall network congestion.I'm not sure if you can change the retransmit timeout, but in any event that seems to be treating symptoms and not the cause.
Chris Ryding
Actually, I did manage to drastically reduce the number of transmissions today by increasing the ARP cache timeout. I noticed whenever more than 2 minutes elapsed between sends on the socket, the probability of a re-transmit increased dramatically. I looked for anything with a period of 2 minutes and found the ARP cache and increased it and there weren't any issues for the past several hours. But that still leaves two question - 1) Why does increasing the ARP cache timeout fix it? 2) Is there a way to shorten the re-transmit time for those times when it still does happen.
Michael Covelli
Ah, so the delay is tied to your PC performing an ARP resolution when the ARP cache entry expires. Basically the PC has to figure out the hardware address of the Linux box again when the entry expires and will hold any pending transmit until that happens. There's a couple of kludges you can try to minimize issues: ping the Linux box once a minute or so, add a static ARP entry on your PC or increase the cache timeout to some silly amount like 8 hours. Not good for actual deployments, though.
Chris Ryding
A: 

I don't recall if Windows has it, but on UNIX you'd enable TCP_NODELAY.

This disables TCP's Nagle Algorithm which makes the system wait for a small time in case more data is going to be added to the transmit buffer.

int nodelay = 1;
setsockopt(s, IPPROTO_TCP, TCP_NODELAY, &nodelay, sizeof(nodelay));
Alnitak
Nagling is dis-abled. Syntax is very similar in .NETsock = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp); sock.NoDelay = true;
Michael Covelli
A: 

http://support.microsoft.com/kb/214397

Moritz