First off, yes, I am answering my own question. Its important to note, however, that without ajmastrean, I would be nowhere. thank you so much!
ONE:
ConnectionFactory.SetReconnAttemptCount, SetReconnAttemptDelay, SetReconnAttemptTimeout should be set appropriately. I think the default values re-try too quickly (on the order of 1/2 second between retries). Our EMS servers can take a long time to failover because of network storage, etc - so 5 retries at 1/2s intervals is nowhere near long enough.
TWO:
I believe its important to enable the client-server and server-client heartbeats. Wasn't able to verify but without those in place, the client might not get the notification that the server is offline or switching in failover mode. This, of course, is a server side setting for EMS.
THREE:
you can watch for failover event by setting Tibems.SetExceptionOnFTSwitch(true); and then wiring up a exception event handler. When in a single-server environment, you will see a "Connection has been terminated" message. However, if you are in a fault-tolerant multi-server environment, you will see this: "Connection has performed fault-tolerant switch to ". You don't strictly need this notification, but it can be useful (especially in testing).
FOUR:
Apparently not clear in the EMS documentation, connection reconnect will NOT work in a single-server environment. You need to be in a multi-server, fault tolerant environment. There is a trick, however. You can put the same server in the connection list twice - strange I know, but it works and it enables the built-in reconnect logic to work.
some code:
private void initEMS()
{
Tibems.SetExceptionOnFTSwitch(true);
_ConnectionFactory = new TIBCO.EMS.TopicConnectionFactory(<server>);
_ConnectionFactory.SetReconnAttemptCount(30); // 30retries
_ConnectionFactory.SetReconnAttemptDelay(120000); // 2minutes
_ConnectionFactory.SetReconnAttemptTimeout(2000); // 2seconds
_Connection = _ConnectionFactory.CreateTopicConnectionM(<username>, <password>);
_Connection.ExceptionHandler += new EMSExceptionHandler(_Connection_ExceptionHandler);
}
private void _Connection_ExceptionHandler(object sender, EMSExceptionEventArgs args)
{
EMSException e = args.Exception;
// args.Exception = "Connection has been terminated" -- single server failure
// args.Exception = "Connection has performed fault-tolerant switch to <server url>" -- fault-tolerant multi-server
MessageBox.Show(e.ToString());
}