views:

70

answers:

3

I have a hefty db server with lots of very similar InnoDB databases. A query that I run often simply updates a timestamp on one row in a small table. This takes like 1-2 ms most of the time. Occasionally, at night, probably while backups and maatkit replication tools are running, one or more of these queries may show "Updating" for several minutes. During this time, other queries, including maatkit queries, seem to be proceeding normally, and no other queries seem to be executing. I have been unable to explain or fix this.

We are using mysql 4.1.22 and gentoo 2.6.21 on a pair of 4-way Xeon with 16gig of RAM and RAIDed drives for storage. Replication is in place and operating well with maatkit confirming replication nightly. InnoDB is using most of the RAM and the cpu's are typically 70-80% idle. The table in question has about 100 rows of about 200 bytes each. I've tried with and without an index on the WHERE clause with no discernible change. No unusual log messages have been found (checked system messages and mysql errors).

Has anybody else heard of this? Solved something like this? Any ideas of how to investigate?

+1  A: 

When making DML operations, InnoDB places locks on rows and index gaps.

The problem is that it locks all rows examined, not only those affected.

Say, if you run this query:

UPDATE  mytable
SET     value = 10
WHERE   col1 = 1
        AND col2 = 2

, the locking will depend on the indexes used for the query:

  • If an index on col1, col2 was used, then only affected rows will be locked

  • If an index on col was used, all rows with col1 = 1 will be locked

  • If an index on col2 was used, all rows with col2 = 2 will be locked

  • If no index was used, all rows and index gaps will be locked (including that on the PRIMARY KEY, so that even INSERT to an AUTO_INCREMENT column will lock)

To make things worse, EXPLAIN in MySQL does not work on DML operations, so you'll have to guess which index was used, since optimizer can pick any if it considers it to be best.

So it may be so that your replication tools and updates concurrently lock the records (and as you can see this may happen even if the WHERE conditions do not overlap).

Quassnoi
Wouldn't the processlist show "Locked" then? I have a "show full processlist" that shows maatkit working on one db, one of these trivial update queries "Updating" on a second db, another of the same trivial update query "Updating" on a third db, and NO OTHER QUERIES running. I haven't posted the processlist cuz it includes client names I'd have to edit out.
PaulC
@PaulC: `SHOW PROCESSLIST` does not display `Locked` for `InnoDB` tables, only for `MyISAM`. Even if the operation against an `InnoDB` table is waiting for a lock, it will still be shown as `Updating`.
Quassnoi
Knowing that InnoDB will show "Updating" when it's really "Locked" is very helpful. Disabling maatkit's resynchronization temporarily stopped the problem, so it seems like something maatkit does locks something for quite a while. Still investigating...
PaulC
A: 

If you can get at the server while this query is hanging, try doing a "show innodb status". Part of the mess of data you get from that is the status of all active connections/queries on InnoDB tables. If your query is hanging because of another transaction, it will be indicated in there. There's samples of the lock data here.

As well, you mention that it seems to happen durint backups. Are you using mysqldump for that? That will lock tables while the dump is active so that the dumped data is consistent.

Marc B
A: 

Using some of the information offered in the responses, we continued investigating and found some disturbing behavior on our server. A simple "check table" on any table in any database caused simple update queries to lock in other databases and other tables. I don't have any idea why this would happen, though we could not reproduce it on MySQL v5.1, so we intend to upgrade our database server.

I don't think maatkit's mk-table-checksum does a "check table" but it is having a similar effect. Turning off this script reduced the problems significantly, but we believe that we cannot live without this script.

I'm going to mark this as the answer to my question. Thanks for the help.

PaulC