views:

2305

answers:

3

We have a TIBCO EMS solution that uses built-in server failover in a 2-4 server environment. If the TIBCO admins fail-over services from one EMS server to another, connections are supposed to be transfered to the new server automatically at the EMS service level. For our C# applications using the EMS service, this is not happening - our user connections are not being transfered to the new server after failover and we're not sure why.

Our application connection to EMS at startup only so if the TIBCO admins failover after users have started our application, they users need to restart the app in order to reconnect to the new server (our EMS connection uses a server string including all 4 production EMS servers - if the first attempt fails, it moves to the next server in the string and tries again).

I'm looking for an automated approach that will attempt to reconnect to EMS periodically if it detects that the connection is dead but I'm not sure how best to do that.

Any ideas? We are using TIBCO.EMS.dll version 4.4.2 and .Net 2.x (SmartClient app)

Any help would be appreciated.

+1  A: 

Client applications may receive notification of a failover by setting the tibco.tibjms.ft.switch.exception system property

Perhaps the library needs that to work?

TheSoftwareJedi
+1  A: 

This post should sum up my current comments and explain my approach in more detail...

The TIBCO 'ConnectionFactory' and 'Connection' types are heavyweight, thread-safe types. TIBCO suggests that you maintain the use of one ConnectionFactory (per server configured factory) and one Connection per factory.

The server also appears to be responsible for in-place 'Connection' failover and re-connection, so let's confirm it's doing it's job and then lean on that feature.

Creating a client side solution is going to be slightly more involved than fixing a server or client setup problem. All sessions you have created from a failed connection need to be re-created (not to mention producers, consumers, and destinations). There are no "reconnect" or "refresh" methods on either type. The sessions do not maintain a reference to their parent connection either.

You will have to manage a lookup of connection/session objects and go nuts re-initializing everyone! or implement some sort of session failure event handler that can get the new connection and reconnect them.

So, for now, let's dig in and see if the client is setup to receive failover notification (tib ems users guide pg 292). And make sure the raised exception is caught, contains the failover URL, and is being handled properly.

Anthony Mastrean
I've set up Tibems.SetExceptionOnFTSwitch(true) so I'm now seeing when our connection fails. We have not yet been able to test in a failover setting but I am getting the exceptions when the connection goes away. Built-in reconnect isn't working when the server comes back though.
ScottCher
We've checked server-client and client-server heartbeats. They were disabled in the test environment and I thought that might be the cause for reconnect not working. Enabled, set to 10s and we still don't get reconnect attempts.
ScottCher
The fact that I can trap for server connection failure is a good sign but I'd rather have the reconnect logic from the EMS library do its job than have to loop reconnect tries manually.
ScottCher
I'm in a similar situation. I'm refactoring an older messaging system that was doing infinite loops trying to re-initialize the Connection instance... I'm hoping to get away from that. I don't have much practical knowledge though, I've been working mostly from the TIBCO docs.
Anthony Mastrean
ajmastream meaning of course that you have access to the docs. I basically had to beg to get access. They were locked on a network drive so I couldn't even read them - I've been flying blind in EMS ever since we started implementing.
ScottCher
I'll post here if TIBCO has anything to say on the subject - our TIB Admin is putting in a problem ticket including some of my code - or if we resolve the issue in some other way. Thanks for your help.
ScottCher
A: 

First off, yes, I am answering my own question. Its important to note, however, that without ajmastrean, I would be nowhere. thank you so much!

ONE: ConnectionFactory.SetReconnAttemptCount, SetReconnAttemptDelay, SetReconnAttemptTimeout should be set appropriately. I think the default values re-try too quickly (on the order of 1/2 second between retries). Our EMS servers can take a long time to failover because of network storage, etc - so 5 retries at 1/2s intervals is nowhere near long enough.

TWO: I believe its important to enable the client-server and server-client heartbeats. Wasn't able to verify but without those in place, the client might not get the notification that the server is offline or switching in failover mode. This, of course, is a server side setting for EMS.

THREE: you can watch for failover event by setting Tibems.SetExceptionOnFTSwitch(true); and then wiring up a exception event handler. When in a single-server environment, you will see a "Connection has been terminated" message. However, if you are in a fault-tolerant multi-server environment, you will see this: "Connection has performed fault-tolerant switch to ". You don't strictly need this notification, but it can be useful (especially in testing).

FOUR: Apparently not clear in the EMS documentation, connection reconnect will NOT work in a single-server environment. You need to be in a multi-server, fault tolerant environment. There is a trick, however. You can put the same server in the connection list twice - strange I know, but it works and it enables the built-in reconnect logic to work.

some code:

private void initEMS()
{
    Tibems.SetExceptionOnFTSwitch(true);
    _ConnectionFactory = new TIBCO.EMS.TopicConnectionFactory(<server>);
    _ConnectionFactory.SetReconnAttemptCount(30);  // 30retries
    _ConnectionFactory.SetReconnAttemptDelay(120000); // 2minutes
    _ConnectionFactory.SetReconnAttemptTimeout(2000); // 2seconds
_Connection = _ConnectionFactory.CreateTopicConnectionM(<username>, <password>);
    _Connection.ExceptionHandler += new EMSExceptionHandler(_Connection_ExceptionHandler);
}
private void _Connection_ExceptionHandler(object sender, EMSExceptionEventArgs args)
{
    EMSException e = args.Exception;
    // args.Exception = "Connection has been terminated" -- single server failure
    // args.Exception = "Connection has performed fault-tolerant switch to <server url>" -- fault-tolerant multi-server
    MessageBox.Show(e.ToString());
}
ScottCher