views:

344

answers:

4

I suppose this issue applies to deadlocks, live-locks, or just lock wait timeouts.

I'm trying to figure out what query is causing a lock that is preventing another query from executing. Oracle has (if memory serves) a LOCK table that you can join onto itself to determine which queries are locking others. I need a way to accomplish the same this in MySQL.

The scenario is that we have long-running jobs that occasionally create a nested transaction that updates the progress field. That way, we're not losing the transactional-ness of the work while keeping the user informed of the progress (i.e. percent complete). The nested transaction sometimes throws a lock timeout exception.

This is very odd, since none of the other work should write - or even read - from the Job table. Sifting through the raw SQL log confirms this. Here is the transaction section from SHOW ENGINE INNODB STATUS:

------------
TRANSACTIONS
------------
Trx id counter 0 479427
Purge done for trx's n:o < 0 479425 undo n:o < 0 0
History list length 19
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, OS thread id 3192
MySQL thread id 31, query id 17417 localhost 127.0.0.1 root
show engine innodb status
---TRANSACTION 0 0, not started, OS thread id 3776
MySQL thread id 29, query id 13062 localhost 127.0.0.1 root
---TRANSACTION 0 479190, not started, OS thread id 2540
MySQL thread id 23, query id 16103 localhost 127.0.0.1 testuser
---TRANSACTION 0 479422, not started, OS thread id 2536
MySQL thread id 19, query id 17338 localhost 127.0.0.1 testuser
---TRANSACTION 0 479194, not started, OS thread id 2528
MySQL thread id 20, query id 16103 localhost 127.0.0.1 testuser
---TRANSACTION 0 479189, not started, OS thread id 2776
MySQL thread id 22, query id 16103 localhost 127.0.0.1 testuser
---TRANSACTION 0 479426, ACTIVE 3 sec, OS thread id 2544 starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 320, 1 row lock(s)
MySQL thread id 18, query id 17414 localhost 127.0.0.1 testuser Updating
update Job set progress=0.000482780829770491 where id=28
------- TRX HAS BEEN WAITING 3 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 23927 n bits 72 index "PRIMARY" of table "test"."job" trx id 0 479426 lock_mode X locks rec but not gap waiting
Record lock, heap no 5 PHYSICAL RECORD: n_fields 12; compact format; info bits 0
 0: len 8; hex 000000000000001c; asc         ;; 1: len 6; hex 0000000750bf; asc     P ;; 2: len 7; hex 0000005d4d2aeb; asc    ]M* ;; 3: len 8; hex 0000000000000005; asc         ;; 4: len 8; hex 0000000000000004; asc         ;; 5: len 8; hex 0000000000000006; asc         ;; 6: len 1; hex 49; asc I;; 7: len 14; hex 800000000000000002749e0e51a6; asc          t  Q ;; 8: len 30; hex 3c6d61703e0a20203c656e7472793e0a202020203c737472696e673e7061; asc <map>   <entry>     <string>pa;...(truncated); 9: len 8; hex 80001245d33e7e3c; asc    E >~<;; 10: SQL NULL; 11: SQL NULL;

------------------
---TRANSACTION 0 479418, ACTIVE 31 sec, OS thread id 960
14 lock struct(s), heap size 1024, 8 row lock(s), undo log entries 3
MySQL thread id 21, query id 17404 localhost 127.0.0.1 testuser

It appears clear that there are only two transactions, and that one of the 14 locks of transaction 479418 is blocking transaction 479426. I would love to know what the offending query is. Any ideas? Even listing the 14 locks and the queries that caused them would be great.

Thanks!

A: 

In a session of msql> try

12.5.5.31. SHOW PROCESSLIST Syntax

SHOW [FULL] PROCESSLIST

SHOW PROCESSLIST shows you which threads are running. You can also get this information from the INFORMATION_SCHEMA PROCESSLIST table or the mysqladmin processlist command. If you have the PROCESS privilege, you can see all threads. Otherwise, you can see only your own threads (that is, threads associated with the MySQL account that you are using). If you do not use the FULL keyword, only the first 100 characters of each statement are shown in the Info field.

Don
This statement only shows you what is executing at exactly the time it is run. Given that my work process is executing hundreds of statements a seconds, the SQL logs are probably more appropriate to debug this problem.
Monkey Boson
A: 

One option is to enable the general query log, which will record all statements run against mysql. Just be careful it doesn't eat all your disk space.

Compare the logs against the ids in the innodb status output, and you'll find the culprit.

Yep... that's what I meant by "SQL log" in the question. Sifting through this log doesn't yield any references to the Job table (nor foreign keys into the table). I'm really curious why there is a lock timeout on an update to.this table given this scenario.
Monkey Boson
Hmm, if they fail it's possible that they never make it there. You'll need to tcpdump (don't forget -s 1500 to get the entire packet off the wire).
I'm not quite sure what you mean by this. I can confirm that the UPDATE statement to the Job table makes it to MySQL, as it is the last line in the log file. This is further supported by the log dump above that lists the UPDATE in transaction 479426. The problem is that there is an existing lock on the Job table that blocks the update.
Monkey Boson
You need to find out what other statement is locking the table. If you can't find this in the query log, you should try dumping all network traffic to mysql.
+2  A: 

If your queries are waiting up to three seconds, that would make them easy to capture in the slow-query-log. Another suggestion that Xaprb writes is to use InnoTop. There is a S.O. post on a similar InnoDB lock issue.

However, you might want to review the code and look for places where you are doing a whole table select. For example, if you have a table work items and you want to select those that are pending, doing a

SELECT * FROM queue WHERE status = 'pending' ORDER BY create_date LIMIT 1`

could be triggering a temp-file-sort condition that will occupy the whole table if it is in a transaction. Adding FOR UPDATE to the select could help it acquire the lock better. Apparently, clustering primary keys can help.

In my environment, my query connection will report an error on a transaction lock issue, so I see errors like: Deadlock found when trying to get lock; try restarting transaction. You might have to check for warnings if your query is actually failing. (This might not help if you can't change how the application reports query failures.)

memnoch_proxy
This is very good information... thank you! In my specific case, none of the queries in the outer transaction take very long (0.1 seconds is the longest), and - oddly - make no references whatsoever to the Job table, where the lock timeout occurs.Our solution involves Hibernate so none of the queries are particularly complicated (i.e. no FOR UPDATEs).Finally, I do have a specific error: "Lock wait timeout exceeded; try restarting transaction." However, given that the error happens in a nested transaction, no amount of transaction restarting helps.
Monkey Boson
+1  A: 

Well, despite the great answers (especially from memnoch_proxy - thank you for the well-thought-out response!), I had top work around this issue in an exceptionally ugly way. Instead of updating the progress on the Job table, I update another table that I had to create called JobProgress, which is not locked.

Very weird.

Monkey Boson