views:

19

answers:

1

I have a tcpip socket interface to a third party software app. I've implemented this interface for several customer sites with no problem. The latest customer, though... problems. We've turned on logging in the apps on either end, and also installed Wireshark on the PC to log raw tcpip traffic. With that, we've proved that my server app successfully sends the message out, the pc receives the message, but the client app doesn't see it. (This is a totally intermittent problem, which is why it's such a pain to troubleshoot.)

The socket details are as simple as they come: one socket handling two way communications between the server and the pc. The messages are plain ascii text and fairly short (not XML). The server initiates communications by sending the first message, and then the client responds with several messages. The socket is kept open at all times while the apps are running. The client app is designed so that the end user can only process one case at a time, which prevents message collisions from happening. They have some sort of polling set up, their app "hibernates" until it sees the initiating message from the server.

The third party vendor has advised me to add a few second delay before I send them the initiating message. I can't see how that helps. If the client is "sleeping", just polling the socket waiting for a message, how does adding a delay before the first message help? It's not like we send two messages and the second one gets lost. It's losing the first message. So I don't see how it matters if we send that message now or two seconds from now.

I've asked them and they haven't given me details. It could be some proprietary details in their coding that they don't want to disclose to me, and that's fair. So I'm asking here because I'm always learning new things about socket programming. Maybe you guys can shed some light on how polling a tcpip socket can be affected by message timing?

+2  A: 

Since its someone else's client and they won't tell you what its doing (other than saying 'insert a delay'), the answer is probably that their client is reading and discarding the message because its not yet in a state to deal with it. The delay will allow the client time to get into a state where it can respond to the message properly.

In other words, the client has a race condition. One easy way this can happen is if they have one thread for reading messages and another for dealing with them.

Short of running strace(1) on the client to see what system calls it is making, its tough to tell what the client is actually doing.

Chris Dodd
Since this is a timing issue where delays help the client read better, another possibility is that the client may not simply be reading correctly to begin with. It could be reading more bytes at one time then it actually knows how to handle, not taking into account that an individual read can (and usually does) return bytes that may belong to more than one message, and that the client is responsible for detecting and buffering socket data so that it can detect boundaries between messages correctly. Many beginner socket programmers assume that 1 write = 1 read, and that is simply not the case
Remy Lebeau - TeamB
If they have a race condition wouldn't a better suggestion be to send my message twice? After all, what's to prevent a race condition two seconds from now after my delay has completed?
Carrie Cobol