views:

177

answers:

3

I'm trying to figure out why one of our migration scripts is taking forever we are trying to do an update that joins from another table to get a relevant piece of data.

Each table (A, B) has about 100,000 rows.

# now populate the ACHIEVEMENT_INSTANCE.OBJECTIVE_INSTANCE_ID
update A a, B b
set a.INSTANCE_ID = b.INSTANCE_ID
where a.ID = b.ID;

It seems like we're dealing with an INNER JOIN that is creating some type of Cartesian product between the 2 tables 100,000 X 100,000 which is taking FOREVER (probably wayyyy to long).

According to MySQL update uses an inner join by default not sure if we could use some other type of JOIN that wouldn't be so shitty.

MySQL documentation UPDATE:

UPDATE [LOW_PRIORITY] [IGNORE] table_references
   SET col_name1={expr1|DEFAULT} [, col_name2={expr2|DEFAULT}] ...
   [WHERE where_condition]

The table_references clause lists the tables involved in the join. Its syntax is described in Section 12.2.8.1, “JOIN Syntax”. Here is an example: UPDATE items,month SET items.price=month.price WHERE items.id=month.id; The preceding example shows an inner join that uses the comma operator, but multiple-table UPDATE statements can use any type of join allowed in SELECT statements, such as LEFT JOIN.

A: 

Doesn't MySQL support correlated subqueries now?

If so, try this:

 update A a, B b
 set a.INSTANCE_ID = (SELECT b.INSTANCE_ID FROM B b WHERE a.ID = b.ID);

(The above assumes there is a B for every A--if not you would need a WHERE EXISTS() to avoid overwriting other a.INSTANCE_ID with nulls).

Could be that the query optimizer is not choosing the right type of join, or there is a locking issue.

It could also be an index issue--if a.INSTANCE_ID is has a clustered index, for instance.

richardtallent
+2  A: 

try an explicit join to see if it improve the performance:

update A a
join B b on a.ID = b.ID
set a.INSTANCE_ID = b.INSTANCE_ID
longneck
+2  A: 

As Greg stated in the comments:

Do you have an index on a.ID and b.ID?

We did not have indexes on those columns. Once we added them the query took 30 seconds:

create index id_idx on A(id);
create index id_idx on B(id);
MarkPowell
Again, thanks Greg. Wish we could have marked your comment as the Accepted Answer. :)
MarkPowell
Which kind of effect does an index have on the database, that decreases the result time that much?
daemonfire300
I'm not positive but I believe an index offers constant time lookup O(n) where a non-indexed column may have to do a full table-scan basically a linear search on 100,000 joins with 100,000 records resulting in O(n^2). (That's at least how I rationalized it with myself haha)
Dougnukem