views:

561

answers:

0

Hi.
We have a system running an application connecting to a cluster of OpenMQ brokers (non-HA).
It's been running for about a week without restart.
This week we noticed the cluster stopped delivering messages to the consumers (I believe these were the messages on second broker).
After checking it in the morning we saw it did not deliver some of the messages produced on previous day in the evening.
We restarted one of the application servers and we got this in its logs (there were no errors in the logs before the restart, changed the name of the hosts for privacy):

2009-05-27 07:55:57,765 ERROR [STDERR] 27-May-2009 07:55:57 com.sun.messaging.jmq.jmsclient.AckQueue printInfo
WARNING: [W2003]: Broker not responding [ADD_CONSUMER(14)] for 120 seconds. Still trying..., broker addr=host1:1093(8095), connectionID=9108285608706294272, clientID=null, consumerID=6
2009-05-27 07:59:57,765 ERROR [STDERR] 27-May-2009 07:59:57 com.sun.messaging.jmq.jmsclient.AckQueue printInfo
WARNING: [W2003]: Broker not responding [ADD_CONSUMER(14)] for 360 seconds. Still trying..., broker addr=host1:1093(8095), connectionID=9108285608706294272, clientID=null, consumerID=6
2009-05-27 08:06:54,250 ERROR [STDERR] 27-May-2009 08:06:54 com.sun.messaging.jmq.jmsclient.ExceptionHandler logCaughtException
WARNING: [I500]: Caught JVM Exception: java.net.SocketException: Connection reset
2009-05-27 08:06:54,265 ERROR [STDERR] Exception in thread "Thread-12"


Nothing unusual was found in the host1 logs.

On the host2:

[26/May/2009:19:58:13 BST] [B1071]: Established cluster connection to broker mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080[pr
od1/host1:8098]
[27/May/2009:07:58:13 BST] WARNING [B2062]: Lock request timed out for resid = queue:MY_QUEUE_1.
Following brokers did not respond :
mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080
[27/May/2009:07:59:13 BST] WARNING [B2062]: Lock request timed out for resid = queue:MY_QUEUE_1.
Following brokers did not respond :
mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080
[27/May/2009:08:00:13 BST] WARNING [B2062]: Lock request timed out for resid = queue:MY_QUEUE_1.
Following brokers did not respond :
mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080
[27/May/2009:08:01:13 BST] WARNING [B2062]: Lock request timed out for resid = queue:MY_QUEUE_1.
Following brokers did not respond :
mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080
[27/May/2009:08:02:13 BST] WARNING [B2062]: Lock request timed out for resid = queue:MY_QUEUE_1.
Following brokers did not respond :
mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080
[27/May/2009:08:03:13 BST] WARNING [B2062]: Lock request timed out for resid = queue:MY_QUEUE_1.
Following brokers did not respond :
mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080
[27/May/2009:08:04:13 BST] WARNING [B2062]: Lock request timed out for resid = queue:MY_QUEUE_1.
Following brokers did not respond :
mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080
[27/May/2009:08:05:13 BST] WARNING [B2062]: Lock request timed out for resid = queue:MY_QUEUE_1.
Following brokers did not respond :
mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080
[27/May/2009:08:06:13 BST] WARNING [B2062]: Lock request timed out for resid = queue:MY_QUEUE_1.
Following brokers did not respond :
mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080
[27/May/2009:08:07:13 BST] WARNING [B2062]: Lock request timed out for resid = queue:MY_QUEUE_1.
Following brokers did not respond :
mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080
[27/May/2009:08:08:13 BST] WARNING [B2062]: Lock request timed out for resid = queue:MY_QUEUE_1.
Following brokers did not respond :
mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080
[27/May/2009:08:09:13 BST] WARNING [B2062]: Lock request timed out for resid = queue:MY_QUEUE_1.
Following brokers did not respond :
mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080
[27/May/2009:08:11:09 BST] WARNING [B2062]: Lock request timed out for resid = queue:MY_QUEUE_5.
Following brokers did not respond :
mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080
[27/May/2009:08:12:09 BST] WARNING [B2062]: Lock request timed out for resid = queue:MY_QUEUE_6.
Following brokers did not respond :
mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080
[27/May/2009:08:12:51 BST] [B1072]: Closed cluster connection to broker mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080
[27/May/2009:08:12:51 BST] ERROR [B3100]: Unexpected Broker Internal Error : [Uncaught Exception]:
com.sun.messaging.jmq.jmsserver.util.BrokerException: [B4117]: Unexpected Broker Exception: [Unexpected Error processing message]
at com.sun.messaging.jmq.jmsserver.data.PacketRouter.handleMessage(PacketRouter.java:199)
at com.sun.messaging.jmq.jmsserver.service.imq.IMQIPConnection.readData(IMQIPConnection.java:1489)
at com.sun.messaging.jmq.jmsserver.service.imq.IMQIPConnection.process(IMQIPConnection.java:644)
at com.sun.messaging.jmq.jmsserver.service.imq.OperationRunnable.process(OperationRunnable.java:170)
at com.sun.messaging.jmq.jmsserver.util.pool.BasicRunnable.run(BasicRunnable.java:493)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.IllegalStateException: Session [9108285608706310144] is closed
at com.sun.messaging.jmq.jmsserver.core.Session.attachConsumer(Session.java:766)
at com.sun.messaging.jmq.jmsserver.data.handlers.ConsumerHandler.createConsumer(ConsumerHandler.java:702)
at com.sun.messaging.jmq.jmsserver.data.handlers.ConsumerHandler.handle(ConsumerHandler.java:306)
at com.sun.messaging.jmq.jmsserver.data.PacketRouter.handleMessage(PacketRouter.java:181)
... 5 more
[27/May/2009:08:12:57 BST] [B1071]: Established cluster connection to broker mq://host1:1093/?instName=prod&brokerSessionUID=1429929463489486080[pr
od1/host1:8098]


We're running version 4.2 of OpenMQ.

Here's the common part of brokers' properties:

imq.jms.threadpool_model=dedicated
imq.instanceconfig.version=300
imq.jms.max_threads=1000
imq.admin.tcp.port=8096
imq.autocreate.queue.consumerFlowLimit=50000
imq.cluster.min_threads=5
imq.message.expiration.interval=90
imq.ssladmin.tls.port=8099
imq.jms.min_threads=10
imq.persist.file.message.max_record_size=5m
imq.ssljms.max_threads=500
imq.ssljms.min_threads=10
imq.jms.tcp.port=8095
imq.service.activelist=jms,admin,ssljms,ssladmin
imq.ssljms.tls.port=8097
imq.autocreate.destination.maxBytesPerMsg=10m
imq.autocreate.queue.maxNumBackupConsumers=50
imq.jmx.rmiregistry.port=8093
imq.log.file.rolloversecs=8640
imq.portmapper.port=7676
imq.cluster.max_threads=10
imq.cluster.port=8098
imq.cluster.brokerlist=host1:1093,host2:1093


First broker's specific cluster properties:
imq.cluster.hostname=host1

Second broker's specific cluster properties:
imq.cluster.hostname=host2

Jms client's (consumer / producer) settings:
imqAddressList=host1:1093,host2:1093
imqReconnect=true
imqReconnectAttempts=1
imqAddressListIterations=-1
imqAddressListBehavior=RANDOM

Any ideas on what happened?