tags:

views:

290

answers:

3

I have a program in C++, using the standard socket API, running on Ubuntu 7.04, that holds open a socket to a server. My system lives behind a router. I want to figure out how long it could take to get a socket error once my program starts sending AFTER the router is cut off from the net.

That is, my program may go idle (waiting for the user). The router is disconnected from the internet, and then my program tries to communicate over that socket.

Obviously it's not going to know quickly, because TCP is quite adept at keeping a socket alive under adverse network conditions. This causes TCP to retry a lot of times, a lot of ways, before it finally gives up.

I need to establish some kind of 'worst case' time that I can give to the QA group (and the customer), so that they can test that my code goes into a proper offline state.

(for reference, my program is part of a pay at pump system for gas stations, and the server is the system that authorizes payment transactions. It's entirely possible for the station to be cut off from the net for a variety of reasons, and the customer just wants to know what to expect).

EDIT: I wasn't clear. There's no human being waiting on this thing, this is just for a back office notation of system offline. When the auth doesn't come back in 30 seconds, the transaction is over and the people are going off to do other things.

EDIT: I've come to the conclusion that the question isn't really answerable in the general case. The number of factors involved in determining how long a TCP connection takes to error out due to a downstream failure is too dependent on the exact equipment and failure for there to be a simple answer.

+1  A: 

I would twist the question around the other way: how long is a till operator prepared to stand there looking stupid in front of the customer before they say, of it must not be working lets to this the manual way.

So pick some time like 1 minute (assuming your network is not auto disconnect, and thus will reconnect when traffic occurs)

Then use that time for how long your program waits before giving up. Closing the socket etc. Displays error message. Maybe even a count down timer while waiting, so the till operator has an idea how much long the system is going to wait...

Then they know the transaction failed, and that it's manual time.

Otherwise depending on you IP stack, the worse case time-out could be 'never times-out'.

Simeon Pilgrim
I've edited my question: There's no human waiting on this, the transaction times out nice and quick.
Michael Kohne
+1  A: 

You should be able to use:

http://linux.die.net/man/2/getsockopt

with:

SO_RCVTIMEO and SO_SNDTIMEO

to determine the timeouts involved.

This link: http://linux.die.net/man/7/socket

talks about more options that may be of interest to you.

In my experience, just picking a time is usually a bad idea. Even when it sounds reasonable, arbitrary timeouts usually misbehave in practice. The general result is that the application becomes unusable when the environment falls outside of the norm.

Especially for financial transactions, this should be avoided. Perhaps providing a cancel button and some indication that the transaction is taking longer than expected would be a better solution.

Christopher
+1  A: 

I think the best approach is not to try and determine the timeout being used, but to actually specify the timeout yourself.

Depending on your OS you can either:-

  • use setsockopt() with option SO_SNDTIMEO,
  • use non-blocking send() and then use select() with a timeout
  • use non-blocking send(), and have a timeout on receiving the expected data.
Roddy