views:

3064

answers:

0

I have an application running Quartz 1.6.1 w/persistent job store, with MySQL 5.1 as the DB. This application used to boot up okay in Tomcat6. At some point, it began throwing the following exception upon EVERY boot:

- MisfireHandler: Error handling misfires: Failure obtaining db row lock: Lock wait timeout exceeded; try restarting transaction
org.quartz.impl.jdbcjobstore.LockException: Failure obtaining db row lock: Lock wait timeout exceeded; try restarting transaction [See nested exception: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction]
    at org.quartz.impl.jdbcjobstore.StdRowLockSemaphore.executeSQL(StdRowLockSemaphore.java:112)
    at org.quartz.impl.jdbcjobstore.DBSemaphore.obtainLock(DBSemaphore.java:112)
    at org.quartz.impl.jdbcjobstore.JobStoreSupport.doRecoverMisfires(JobStoreSupport.java:3075)
    at org.quartz.impl.jdbcjobstore.JobStoreSupport$MisfireHandler.manage(JobStoreSupport.java:3838)
    at org.quartz.impl.jdbcjobstore.JobStoreSupport$MisfireHandler.run(JobStoreSupport.java:3858)
Caused by: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3491)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3423)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1936)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2060)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2542)
    at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1734)
    at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1885)
    at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeQuery(NewProxyPreparedStatement.java:76)
    at org.quartz.impl.jdbcjobstore.StdRowLockSemaphore.executeSQL(StdRowLockSemaphore.java:92)
    ... 4 more

I should mention this application also utilizes JPA w/Hibernate using C3P0 for data source connection pooling. This exception is always thrown directly after JPA finishes updating my schema.

First, I upgraded to Quartz 1.6.5 and the exception went away, but the application appears frozen. The last thing in the logs - where the exception used to be - is:

...hbm2ddl stuff...
2969 [Thread-1] INFO org.hibernate.tool.hbm2ddl.SchemaUpdate - schema update complete
- Handling 6 trigger(s) that missed their scheduled fire-time.

With nothing coming after, and the webapp not servicing requests; they just hang indefinitely.

When I run mysql command-line client with SHOW INNODB STATUS right after the exception, it does consistently show two suspicious transactions:

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 49, signal count 49
Mutex spin waits 0, rounds 2100, OS waits 0
RW-shared spins 115, OS waits 49; RW-excl spins 0, OS waits 0
------------
TRANSACTIONS
------------
Trx id counter 0 165688
Purge done for trx's n:o < 0 165685 undo n:o < 0 0
History list length 12
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, OS thread id 5012
MySQL thread id 8, query id 1798 localhost 127.0.0.1 root
SHOW INNODB STATUS
---TRANSACTION 0 165687, ACTIVE 300 sec, OS thread id 3772
2 lock struct(s), heap size 320, 1 row lock(s)
MySQL thread id 30, query id 1795 localhost 127.0.0.1 my_app
---TRANSACTION 0 165685, ACTIVE 360 sec, OS thread id 5460
2 lock struct(s), heap size 320, 1 row lock(s), undo log entries 1
MySQL thread id 34, query id 1680 localhost 127.0.0.1 my_app

I'm looking for guidance on how to further investigate this issue. Perhaps if I could somehow identify the owners of these two transactions, or what resources they are locking?

Update: I deleted all rows in the *qrtz_simple_triggers* table without problem. I then tried to do the same on the *qrtz_triggers* table and my MySQL client threw a "Lock wait timeout exceeded" error. At this point I stopped my (still hanging) application and was then able to delete all rows of the *qrtz_triggers* table. Once this was done I was able to successfully boot my application.

It appears I need to log a new Quartz bug, but I'd like to be able to give them more information about what is actually hapenning here. So, as per the original question, how can I troubleshoot these types of issues?