ansaurus

Question

(network sockets) bytes stuck in Send Queue for 15 minutes; why?

Answer 1

+1 A:

Are you missing a flush() on the S side after sending the response?

Peter 2009-04-02 14:00:54

No, the same code is executed for the others, and it works well. It also works on other environments. It's been working well in the past too. It's definitely a network issue.

Nicolas 2009-04-06 11:11:26

Answer 2

+1 A:

Right. If you're using a BufferedOutputStream you need to call flush() unless you reach the max buffer size.

Nick 2009-04-02 15:25:31

Flush is called.

Nicolas 2009-04-06 11:12:07

Answer 3

+2 A:

If it worked ok in your local network, then I don't envisage this being a programming issue (re. the flush() comments).

Is network connectivity between the 2 machines normal otherwise ? Can you transfer similar quantities of data via (say) FTP with no problem. Can you replicate this issue by knocking together a client/server script just to send appropriately sized chunks of data. i.e. is the network connectivity good between W and S ?

Another question. You now have a firewall inbetween. Could this be a possible bottleneck that wasn't there before ? (not sure how that would explain the consistent 15m delay though).

Final question. What are your TCP configuration parameters set up to be (on both W and S - I'm thinking about the OS-level parameters). Is there anything there that would suggest or lead to a 15m figure.

Not sure if that's any help.

Brian Agnew 2009-04-02 17:35:52

Answer 4

+1 A:

Apart from trying that Brian said, you could also check the following

1) Run tcpdump on any one of the servers, and see the sequence of message flows from the time when a job is initiated to after the delay, when all processing is complete. That will tell you which side is causing the delay (W or S). Check if there are any retransmissions, missed acks, and so on.

2) Is there some kind of fragmentation happening between W and S?

3) What are the network load conditions on the servers on which the bytes are stuck? Is heavy load causing output errors, resulting in socket queues not being emptied? (There could also be a NIC bug, wherein after hitting some error condition, the NIC buffers are not flushed, or fails to resume transmission, and such a condition is getting cleared by some sort of a watchdog)

More information on the above two would definitely help.

Harty 2009-04-03 04:27:33

Don't have sufficient priviledges to run tcpdump, but that's what I'm trying to do with the network guys. Not sure about the network conditions. On the subnetworks where W and S are, the load is low, but there might be a bottleneck in a router where the packets go through.

Nicolas 2009-04-06 11:14:00

Answer 5

A:

Are you sure that the threads stuck in read calls are the same threads that were sending the data ? Is it possible that the threads actually involved are instead blocked on some other activity, and your stackdump shows other innocent threads that just happen to be doing socket i/o ? It's been a while since I worked with Java, but I vaguely remember the JVM using sockets for IPC.

I would examine all the receiving side to see if one of them is the intended receiver and is instead doing something else for 15 minutes.

The fact that it works in one location vs another usually points to an application timing error, not a datacenter problem.

2009-04-03 22:20:49

Yes I'm sure. The threads in question are in a dedicated threadpool and are named differently. I don't understand the logic behind the timing leading to the application. But I agree 15 minutes is quite a lot for a network timeout.

Nicolas 2009-04-06 11:15:34

hmm,check for closed receive TCP windows; that indicates an application problem. Sniff on each host physically to get the true picture of the traffic from end-to-end. break the windows app in a debugger for 20 min to see if the other servers finish up anyway.

2009-04-06 21:19:25

ansaurus

tags:

views:

answers:

(network sockets) bytes stuck in Send Queue for 15 minutes; why?

related questions