views:

68

answers:

1

Hello

I have a client/server app (Java) that I'm migrating from Solaris to RH Linux. since I started running it in RH, I noticed some issues related to latency. I managed to isolate the problem that looks like this:

  • client sends 5 messages (32 bytes each) in a row (same application timestamp) to the server.
  • server echos messages.
  • client receives replies and prints round trip time for each msg.

in Solaris, all is well: I get ALL 5 replies at the same time, roughly 80ms after having sent original messages (client & server are several thousands miles away from each other: my ping RTT is 80ms, all normal).

in RH, first 3 messages are echoed normally (they arrive 80ms after they've been sent), however the following 2 arrive 80ms later (so total 160ms RTT).

the pattern is always the same. clearly looked like a TCP problem.

on my solaris box, I had previously configured the tcp stack with 2 specific options:

  1. disable nagle algorithm globally
  2. set tcp_deferred_acks_max to 0

on RH, it's not possible to disable nagle globally, but I disabled it on all of my apps' sockets (TCP_NODELAY).

so I started playing with tcpdump (on the server machine), and compared both outputs:

SOLARIS:

 22 2.085645    client         server          TCP      56150 > 6006 [PSH, ACK] Seq=111 Ack=106 Win=66672 Len=22    "MSG_1 RCV"
 23 2.085680    server         client          TCP      6006 > 56150 [ACK] Seq=106 Ack=133 Win=50400 Len=0
 24 2.085908    client         server          TCP      56150 > 6006 [PSH, ACK] Seq=133 Ack=106 Win=66672 Len=22    "MSG_2 RCV"
 25 2.085925    server         client          TCP      6006 > 56150 [ACK] Seq=106 Ack=155 Win=50400 Len=0
 26 2.086175    client         server          TCP      56150 > 6006 [PSH, ACK] Seq=155 Ack=106 Win=66672 Len=22    "MSG_3 RCV"
 27 2.086192    server         client          TCP      6006 > 56150 [ACK] Seq=106 Ack=177 Win=50400 Len=0
 28 2.086243    server         client          TCP      6006 > 56150 [PSH, ACK] Seq=106 Ack=177 Win=50400 Len=21    "MSG_1 ECHO"
 29 2.086440    client         server          TCP      56150 > 6006 [PSH, ACK] Seq=177 Ack=106 Win=66672 Len=22    "MSG_4 RCV"
 30 2.086454    server         client          TCP      6006 > 56150 [ACK] Seq=127 Ack=199 Win=50400 Len=0
 31 2.086659    server         client          TCP      6006 > 56150 [PSH, ACK] Seq=127 Ack=199 Win=50400 Len=21    "MSG_2 ECHO"
 32 2.086708    client         server          TCP      56150 > 6006 [PSH, ACK] Seq=199 Ack=106 Win=66672 Len=22    "MSG_5 RCV"
 33 2.086721    server         client          TCP      6006 > 56150 [ACK] Seq=148 Ack=221 Win=50400 Len=0
 34 2.086947    server         client          TCP      6006 > 56150 [PSH, ACK] Seq=148 Ack=221 Win=50400 Len=21    "MSG_3 ECHO"
 35 2.087196    server         client          TCP      6006 > 56150 [PSH, ACK] Seq=169 Ack=221 Win=50400 Len=21    "MSG_4 ECHO"
 36 2.087500    server         client          TCP      6006 > 56150 [PSH, ACK] Seq=190 Ack=221 Win=50400 Len=21    "MSG_5 ECHO"
 37 2.165390    client         server          TCP      56150 > 6006 [ACK] Seq=221 Ack=148 Win=66632 Len=0
 38 2.166314    client         server          TCP      56150 > 6006 [ACK] Seq=221 Ack=190 Win=66588 Len=0
 39 2.364135    client         server          TCP      56150 > 6006 [ACK] Seq=221 Ack=211 Win=66568 Len=0

REDHAT:

 17 2.081163    client         server          TCP      55879 > 6006 [PSH, ACK] Seq=111 Ack=106 Win=66672 Len=22    "MSG_1 RCV"
 18 2.081178    server         client          TCP      6006 > 55879 [ACK] Seq=106 Ack=133 Win=5888 Len=0
 19 2.081297    server         client          TCP      6006 > 55879 [PSH, ACK] Seq=106 Ack=133 Win=5888 Len=21 "MSG_1 ECHO"
 20 2.081711    client         server          TCP      55879 > 6006 [PSH, ACK] Seq=133 Ack=106 Win=66672 Len=22    "MSG_2 RCV"
 21 2.081761    client         server          TCP      55879 > 6006 [PSH, ACK] Seq=155 Ack=106 Win=66672 Len=22    "MSG_3 RCV"
 22 2.081846    server         client          TCP      6006 > 55879 [PSH, ACK] Seq=127 Ack=177 Win=5888 Len=21 "MSG_2 ECHO"
 23 2.081995    server         client          TCP      6006 > 55879 [PSH, ACK] Seq=148 Ack=177 Win=5888 Len=21 "MSG_3 ECHO"
 24 2.082011    client         server          TCP      55879 > 6006 [PSH, ACK] Seq=177 Ack=106 Win=66672 Len=22    "MSG_4 RCV"
 25 2.082362    client         server          TCP      55879 > 6006 [PSH, ACK] Seq=199 Ack=106 Win=66672 Len=22    "MSG_5 RCV"
 26 2.082377    server         client          TCP      6006 > 55879 [ACK] Seq=169 Ack=221 Win=5888 Len=0
 27 2.171003    client         server          TCP      55879 > 6006 [ACK] Seq=221 Ack=148 Win=66632 Len=0
 28 2.171019    server         client          TCP      6006 > 55879 [PSH, ACK] Seq=169 Ack=221 Win=5888 Len=42 "MSG_4 ECHO + MSG_5 ECHO"
 29 2.257498    client         server          TCP      55879 > 6006 [ACK] Seq=221 Ack=211 Win=66568 Len=0

so, I got confirmation things are not working correctly for RH: packet 28 is sent TOO LATE, it looks like the server is waiting for packet 27's ACK before doing anything.

seems to me it's the most likely reason...

then I realized that the "Win" parameters are different on Solaris & RH dumps: 50400 on Solaris, only 5888 on RH. that's another hint...

I read the doc about the slide window & buffer window, and played around with the rcvBuffer & sendBuffer in java on my sockets, but never managed to change this 5888 value to anything else (I checked each time directly with tcpdump).

does anybody know how to do this ? I'm having a hard time getting definitive information, as in some cases there's "auto-negotiation" that I might need to bypass, etc...

I eventually managed to get only partially rid of my initial problem by setting the "tcp_slow_start_after_idle" parameter to 0 on RH, but it did not change the "win" parameter at all. the same problem was there for the first 4 groups of 5 messages, with TCP retransmission & TCP Dup ACK in tcpdump, then the problem disappeared altogether for all following groups of 5 messages.

It doesn't seem like a very clean and/or generic solution to me. I'd really like to reproduce the exact same conditions under both OSes.

I'll keep researching, but any help from TCP gurus would be greatly appreciated !

thanks !

+1  A: 

It looks like the congestion avoidance algorithm is kicking in on the Red Hat box.

Notice that as of packet 26, the server has seen and ACKed everything from the client, but the client has only ACKed the server's initial SYN - it hasn't ACKed any of the server's messages yet. Notice also that packet 27, which kick-starts things again, is the client acknowledging the server's first two lots of data (packets 19 and 22).

Which congestion control algorithm is the Red Hat box using? (/proc/sys/net/ipv4/tcp_congestion_control) - you could try switching to one of the other available ones.

caf
yes it's the point I'm concerned about, I see a lot of echos are not ACKed by the client, and imagine some mechanism stops the server from sending anything anymore until ACKs are received: it only sends the 4th and 5th echo AFTER it received an ACK.just don't know too well what is causing this mechanism and how to influence it.will play around with congestion control, currently it's "cubic", will research... thanks !
Bastien
@Bastien: Try setting it to "reno" (just `echo "reno" > /proc/sys/net/ipv4/tcp_congestion_control`) and see if you like that behaviour better.
caf
changed it to "reno", no change in behaviour at all...
Bastien