views:

69

answers:

3

Background:

We have a client/server application that uses a persistent connection to the server.

Benchmarks show that it is many times faster to use an already open connection rather than spend significant time (2.5 seconds) setting up a new connection (crypto).

Unfortunately, the old connection may be stale.

Is there a way to wait for the system-level result of sending a message [either ACK or error]?

Waiting for read and then getting the end of stream causes confusion.

I know the message might be broken up into packets. It would suit my purposes equally well to know either if any part of the message was acked or if all of it was. The interesting problem here is stale connection.

A: 

At the layer at which you handle the TCP communication, you don't have to bother with ACKs. That's the job of layer 3 and under. For your protocol make always command/response requests. The response shall be there no matter if success or error. Interpreting the no reponse as a success is hazardous, as a breaking communication can lead to the same effect too.

I don't know exactly what you mean by stale. Because TCP is a connection oriented protocol, the connection is there or not. Therefore I do not quite undestand what trouble you have: if you loose the connection, you have to make the effort of creating a new one.

jdehaan
Stale means the connection got snapped by a stateful firewall long ago but neither end knows it yet.
Joshua
(or sometimes by something else that doesn't respect keepalives)
Joshua
A: 

The TCP stack itself is unlikely to tell you in a timely manner if the connection is no longer viable (unless it's locally broken and the OS can report local failures to the stack). Due to retransmission timeouts (see here) and whatever it can take 'quite a time' before a write will return a failure to indicate that the connection is broken. This is, of course, by design, it's just that the design is at odds with what you want to do.

You could try using TCP keep alives but IMHO these aren't really worth the effort and you'd be better of implementing some form of application level ACK if possible so that you can cause your application level protocol to send back a response as soon as it gets some data from you. If you can't do that, and your request elicits a response from the other end of the connection then you can simply set up a timer based on how long you're prepared to wait for the response before assuming the connection is dead. Once the timer expires you close your connection and establish a new one.

It may be that you could fire off a new connection attempt and also send the request on the old connection, if the new connection becomes available before you get a response from the old one then you can send the request on the new connection and close the old one... Of course this relies on your application being able to deal with this kind of thing.

Finally, if the connection is being broken due to inactivity then perhaps you could add an application level ping to your protocol which can be set up to send a message every so often to a) ensure the connection is alive and b) stop routers or firewalls from thinking the connection is dead.

Len Holgate
A: 

Unfortunately, the old connection may be stale.

In which case you will get an exception when writing to it, eventually.

Is there a way to wait for the system-level result of sending a message [either ACK or error]?

No.

Waiting for read and then getting the end of stream causes confusion.

Confusion to whom? It's the code's job to handle confusion. If you get an unexpected EOS the peer has closed the connection, or an intermediate firewall has, in which case you have to deal with it.

I know the message might be broken up into packets.

Completely irrelevant. You don't have any control over that or any visibility of it either. What you get is a byte stream terminated by EOS or an exception.

It would suit my purposes equally well to know either if any part of the message was acked or if all of it was.

No it wouldn't. The ACK only means it has got as far as the peer's TCP/IP stack. What your application is interested in is whether it has gotten into the peer application, and only the peer application can tell you that, via an application-protocol-level ACK. TCP/IP ACKs are of no help here.

The interesting problem here is stale connection.

And that's a rather trivial problem. You can detect it in code and you can deal with it ditto. Database vendors have been doing this stuff for decades. Not rocket science, and nothing that requires knowledge of the TCP ACKs.

EJP
"The ACK only means it has got as far as the peer's TCP/IP stack." Which is good enough for me. I'm trying to distinguish an extremely common failure from an extremely rare one.
Joshua