views:

494

answers:

2

I have a simple JMS application deployed on OC4J under AIX server, in my application I'm listening to some queues and sending to other queues on a Websphere MQ deployed under AS400 server.

The problem is that my connections to these queues are terminated/closed when it stays idle for some time with the error MQJMS1016 (this is not the problem), and when that happens I attempt to recover the connection and it works, however, the old connection is stuck at the MQ and would not terminate until it is terminated manually.

The recovery code goes as follows:

public void recover() {
    cleanup();
    init();
}

public void cleanup(){
    if (session != null) {
        try {
            session .close();
        } catch (JMSException e) {
        }
    }
    if (connection != null) {
        try {
            connection.close();
        } catch (JMSException e) {
        }
    }
}

public void init(){
    // typical initialization of the connection, session and queue...
}
A: 

Since the orphaned connections (stuck connections on MQ side) does not affect the messages processing (i.e. they do not consume messages), we left things as it is until the maximum connections allowed on the MQ was reached.

The recovery did not work anymore, and once we reached that point, the MQ administrator had to clean the orphaned connection manually, however, the good news is that searching for this particular problem led to an issue reported on IBM support site:

check here

Ahmad
The APAR referenced just tells you how to tune the channels so WMQ reaps the orphans quicker. Although this is helpful, it is still only a workaround and no substitute for fixing the root cause and printing linked exceptions.
T.Rob
+1  A: 

The MQJMS1016 is an internal error and indicates that the connection loss is due to something wrong with the code or WMQ itself. Tuning the channels will help but you really need to get to the problem of why the app is spewing orphaned connections fast enough to exhaust all available channels.

The first thing I'd want to do is check the versions of WMQ and of the WMQ client that are running. If this is new development, be sure you are using the WMQ v7 client because v6 is end-of-life as of Sept 2011. The v7 client works with v6 QMgrs until you are able to upgrade that as well. Once you get to v7 client and QMgr, there are quite a bit of channel tuning and reconnection options available to you.

The WMQ v7 client download is here: http://bit.ly/bXM0q3

Also, note that the reconnect logic in the code above does not sleep between attempts. If a client throws connection requests at a high rate of speed, it can overload the WMQ listener and execute a very effective DOS attack. Recommended to sleep a few seconds between attempts.

Finally, please, PLEASE print the linked exceptions in your JMSException catch blocks. If you have a problem with a JMS transport provider, the JMS Linked Exception will contain any low-level error info. In the case of WMQ it contains the Reason Code such as 2035 MQRC_AUTHORIZATION_ERROR or 2033 MQRC_NO_MSG_AVAILABLE. Here's an example:

try {
  .
  . code that might throw a JMSException
  .
} catch (JMSException je) {
  System.err.println("caught "+je);
  Exception e = je.getLinkedException();
  if (e != null) {
    System.err.println("linked exception: "+e);
  } else {
    System.err.println("No linked exception found.");
  }
}

If you get an error at 2am some night, your WMQ administrator will thank you for the linked exceptions.

T.Rob
The network was configured to terminate any idle connection, so the problem was clear to us and the linked exception was kind of irrelevant (however, I'll check the issue's history and provide the linked exception when i have more time).The connection recovery was on a 30 seconds timeout actually, and the maximum-connections-is-reached-problem needed several days to occur (not as often as my original post may suggest)Eventually, we had to send keep-alive messages through every connection we initiate.
Ahmad
That's excellent news! For what it's worth, the linked exception thing is not specific to WMQ. Any transport provider has the option of putting relevant info in there. So if it becomes a coding standard it will help with any JMS code. I have many clients who won't accept code to production without it. Either way, glad to hear the keepalives are maintaining the connections now.
T.Rob