views:

60

answers:

4

Hi,

I have a MySQL UPDATE query which takes a long time to complete. Am I missing a much simpler way to achieve the same result?

"UPDATE table2, table1
SET table2.id_occurrences = (SELECT SUM(IF(id = table2.id, 1, 0)) FROM table1)
WHERE table2.id = table1.id;"
  • table2 contains all possible values of id, exactly one record for each.
  • table1 contains some values of id, but there are multiple records of some values.
  • I need to update records in table2 to show the number of occurrences of the corresponding value of id in table1. The above query does the job, but it takes about 3 minutes when table1 contains 500 records, and table2 30,000 records. I have much bigger tables to process so this is too long :)

Thanks in advance.

+5  A: 

I think your join on the update is perhaps not necessary...

UPDATE table2
    SET table2.id_occurrences = (SELECT COUNT(*) FROM table1
                                     WHERE table2.id = table1.id);
Brian Hooper
Indeed. And make sure you've got indexes on the id columns in each table.
Matt Gibson
Ah yes, that's much quicker. I see now, I'm not even updating table1. Thanks a lot.
edanfalls
+1  A: 

I'd go for something like:

UPDATE table2
SET id_occurrences = (SELECT count(*) FROM table1
                      WHERE table1.id = table2.id)
Jonathan
If I could accept this answer too, I would. Thanks for the help, Brian just got there first :)
edanfalls
He did indeed! And with almost identical code. Great minds think alike, they say. ;)
Jonathan
+1  A: 

Avoid subqueries, use joins:

UPDATE table2
LEFT JOIN table1 ON (table2.id = table1.id)
SET table2.id_occurrences = COUNT(table1.id)
GROUP BY table2.id

Oh, UPDATE doesn't support GROUP BY. Try this query:

UPDATE table2
LEFT JOIN (
   SELECT id, COUNT(*) AS cnt FROM table1 GROUP BY id
) AS t1
ON (table2.id = t1.id)
SET table2.id_occurrences = t1.cnt
Naktibalda
Thanks. Is there a particular reason to avoid subqueries?I'd be interested to see the difference in speed between this method and Brian's/Jonathan's, which brought the query time down from 150s to 20s, but I get the following error with the syntax:`#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'GROUP BY table2.id' at line 4`
edanfalls
Dependent subquery in WHERE is executed for each row of main query, that can result in a rather big number of additional queries. It's best to rewrite query as JOIN. I made up that query, looks like UPDATE doesn't support GROUP BY. So I will give you a query with JOINed subquery soon . Joined subquery is executed only once.
Naktibalda
What's an execution time of this query?
Naktibalda
Wow, that works and it's super quick! Only 70s to complete when using a 4,000,000 entries in table1 and 30,000 in table2.I wasn't being lazy, just couldn't get it to work using the MySQL docs. I'm fine with any other language docs, but MySQL always throws me :DThanks for your help.
edanfalls
On the smaller tables that took 20s with the above method, this took 1.7s. The performance hit seems to increase exponentially with table size though, which makes sense.
edanfalls
A: 
"UPDATE table2, table1 SET table2.id_occurrences = (SELECT SUM(IF(id = table2.id, 1, 0)) FROM table1) WHERE table2.id in (select distinct table1.id from table1) and table2.id = table1.id;"
Tank