tags:

views:

183

answers:

2

I have been doing socket programming for many years, but I have never had a missed message using TCP - until now. I have a java server and a client in C - both on the localhost. They are sending short message back and forth as strings, with some delays in between. I have one particular case where a message never arrives on the client side. It is reproducible, but oddly machine dependent.

To give some more details, I can debug the server side and see the send followed by the flush. I can attach to the client and walk through the select calls (in a loop) but it simply never shows up. Has anyone experienced this and is there an explanation other than a coding error?

In other words, if you have a connected socket and do a write on one side and a read on the other, what can happen in the middle to cause something like this?

One other detail - I've used tcpdump on the loopback interface and can see the missed message.

+3  A: 

I've seen this happen in SMTP transactions before. Do you have a virus scanner running on that machine? If so try turning it off and see if that makes a difference.

Otherwise, I'd suggest installing Wireshark so you can take a look at what's actually happening.

Jordan Stewart
He used tcpdump and saw the missing message. Wireshark gives more details but won't make a big difference.
bortzmeyer
+1  A: 

Finally - after sniffing some more, I found the problem. Two messages were getting sent before a read (sometimes, but rarely...) so they were both read, but only the first was handled. This is why it seemed as though the second message never arrived. It was buried in the receive buffer.

AdamC
You've said you have been programming sockets for years and you didn't implement any kind of flow control, like STX/ETX or message length prefix? Sounds fishy. You should never rely on assumption that everything that was sent from on write() will be received by exactly one read(). Read UNIX Network Programming Vol. 1 by W. Richard Stevens - it is a truly great resource for network programmers.
qrdl
qrdl makes an excellent point. UDP guarantees 1 write == 1 read, but TCP explicitly does not (to enable better throughput).
j_random_hacker
Good point about flow control - that is a great way to prevent this kind of error, but calling bs on my experience is rude. I almost didn't post the answer since it was such a stupid mistake for just this reason, but I wanted to give anyone else seeing this an idea where to look.
AdamC