views:

90

answers:

3

See this previous question for some background. I'm trying to renumber a corrupted MPTT tree using SQL. The script is working fine logically, it is just much too slow.

I repeatedly need to execute these two queries:

UPDATE `tree`
SET    `rght` = `rght` + 2
WHERE  `rght` > currentLeft;

UPDATE `tree`
SET    `lft` = `lft` + 2
WHERE  `lft` > currentLeft;

The table is defined as such:

CREATE TABLE `tree` (

  `id`        char(36) NOT NULL DEFAULT '',
  `parent_id` char(36) DEFAULT NULL,
  `lft`       int(11) unsigned DEFAULT NULL,
  `rght`      int(11) unsigned DEFAULT NULL,
  ... (a couple of more columns) ...,

  PRIMARY KEY (`id`),
  KEY `parent_id` (`parent_id`),
  KEY `lft` (`lft`),
  KEY `rght` (`rght`),
  ... (a few more indexes) ...

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The database is MySQL 5.1.37. There are currently ~120,000 records in the table. Each of the two UPDATE queries takes roughly 15 - 20 seconds to execute. The WHERE condition may apply to a majority of the records, so that almost all records need to be updated each time. In the worst case both queries are executed as many times as there are records in the database.

Is there a way to optimize this query by keeping the values in memory, delaying writing to disk, delaying index updates or something along these lines? The bottleneck seems to be hard disk throughput right now, as MySQL seems to be writing everything back to disk immediately.

Any suggestion appreciated.

+1  A: 

You're updating indexed columns - indexes negatively impact (read: slow down) INSERT/UPDATEs.

If this is a one time need to get things correct:

  1. Drop/delete the indexes on the columns being updated (lft, rght)
  2. Run the update statements
  3. Re-create the indexes (this can take time, possibly equivalent to what you already experience in total)
OMG Ponies
Together with these queries there a number of `SELECT` queries being run (see [original problem and answer](http://stackoverflow.com/questions/3623645/how-to-repair-a-corrupted-mptt-tree-nested-set-in-the-database-using-sql/3634268#3634268)). Wouldn't dropping the index negatively impact those?
deceze
@deceze: Yes, the SELECTs could be slower if the optimizer was making use of the indexes (existence of doesn't ensure use) but the fact the data is changing and queries are still being run on mutating data is a bigger deal to me...
OMG Ponies
This is indeed the way that I usually do it, but you don't have to drop the indexes per se. There is a DISABLE/ENABLE KEYS statement for the ALTER TABLE that also does the trick. The ENABLE is less effort and a lot faster than having to recreate the entire index.
Blizz
+3  A: 

I never used it, but if your have enough memory, try the memory table.

Create a table with the same structure as tree, insert into .. select from .., run your scripts against the memory table, and write it back.

andrem
This looks indeed like the best option. I'm currently experimenting with it... Does anybody know of any gotchas?
deceze
Thanks, works like a charm. Switching to a MEMORY table and choosing the right indexes cut the running time down from a few days to about 10 minutes. See [here](http://stackoverflow.com/questions/3623645/how-to-repair-a-corrupted-mptt-tree-nested-set-in-the-database-using-sql/3634268#3634268) for the full final script.
deceze
+2  A: 

Expanding on some ideas from comment as requested:

The default is to flush to disk after every commit. You can wrap multiple updates in a commit or change this parameter:

http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit

The isolation level is simple to change. Just make sure the level fits your design. This probably won't help because a range update is being used. It's nice to know though when looking for some more concurrency:

http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html

Ultimately, after noticing the range update in the query, your best bet is the MEMORY table that andrem pointed out. Also, you'll probably be able to find some performance by using a btree indexes instead of the default of hash:

http://www.mysqlperformanceblog.com/2008/02/01/performance-gotcha-of-mysql-memory-tables/

Rob Olmos
Thanks for pointing out the hash/B-Tree indexes. I'm currently using hashes for the id/parent_id columns and B-Trees for the lft/rght columns, seems to be the best fit. Using a MEMORY table brought it up from seconds-per-record to records-per-second already, let's see if I can squeeze any more performance out of this.
deceze
It turns out that a hybrid method is actually the best way to go. I'm using hash indexes for the first part of the operation, then switch to B-Trees for the second part. That speeds up the whole routine by 500%. :)
deceze