I'm a relative novice when it comes to databases. We are using MySQL and I'm currently trying to speed up a SQL statement that seems to take a while to run. I looked around on SO for a similar question but didn't find one.
The goal is to remove all the rows in table A that have a matching id in table B.
I'm currently doing the following:
DELETE FROM a WHERE EXISTS (SELECT b.id FROM b WHERE b.id = a.id);
There are approximately 100K rows in table a and about 22K rows in table b. The column 'id' is the PK for both tables.
This statement takes about 3 minutes to run on my test box - Pentium D, XP SP3, 2GB ram, MySQL 5.0.67. This seems slow to me. Maybe it isn't, but I was hoping to speed things up. Is there a better/faster way to accomplish this?
EDIT:
Some additional information that might be helpful. Tables A and B have the same structure as I've done the following to create table B:
CREATE TABLE b LIKE a;
Table a (and thus table b) has a few indexes to help speed up queries that are made against it. Again, I'm a relative novice at DB work and still learning. I don't know how much of an effect, if any, this has on things. I assume that it does have an effect as the indexes have to be cleaned up too, right? I was also wondering if there were any other DB settings that might affect the speed.
Also, I'm using INNO DB.
Here is some additional info that might be helpful to you.
Table A has a structure similar to this (I've sanitized this a bit):
DROP TABLE IF EXISTS `frobozz`.`a`;
CREATE TABLE `frobozz`.`a` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`fk_g` varchar(30) NOT NULL,
`h` int(10) unsigned default NULL,
`i` longtext,
`j` bigint(20) NOT NULL,
`k` bigint(20) default NULL,
`l` varchar(45) NOT NULL,
`m` int(10) unsigned default NULL,
`n` varchar(20) default NULL,
`o` bigint(20) NOT NULL,
`p` tinyint(1) NOT NULL,
PRIMARY KEY USING BTREE (`id`),
KEY `idx_l` (`l`),
KEY `idx_h` USING BTREE (`h`),
KEY `idx_m` USING BTREE (`m`),
KEY `idx_fk_g` USING BTREE (`fk_g`),
KEY `fk_g_frobozz` (`id`,`fk_g`),
CONSTRAINT `fk_g_frobozz` FOREIGN KEY (`fk_g`) REFERENCES `frotz` (`g`)
) ENGINE=InnoDB AUTO_INCREMENT=179369 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;
I suspect that part of the issue is there are a number of indexes for this table.
Table B looks similar to table B, though it only contains the columns id
and h
.
Also, the profiling results are as follows:
starting 0.000018
checking query cache for query 0.000044
checking permissions 0.000005
Opening tables 0.000009
init 0.000019
optimizing 0.000004
executing 0.000043
end 0.000005
end 0.000002
query end 0.000003
freeing items 0.000007
logging slow query 0.000002
cleaning up 0.000002
SOLVED
Thanks to all the responses and comments. They certainly got me to think about the problem. Kudos to dotjoe for getting me to step away from the problem by asking the simple question "Do any other tables reference a.id?"
The problem was that there was a DELETE TRIGGER on table A which called a stored procedure to update two other tables, C and D. Table C had a FK back to a.id and after doing some stuff related to that id in the stored procedure, it had the statement
DELETE FROM c WHERE c.id = theId;
I looked into the EXPLAIN statement and rewrote this as
EXPLAIN SELECT * FROM c WHERE c.other_id = 12345;
so I could see what this was doing and it gave me the following info:
id 1
select_type SIMPLE
table c
type ALL
possible_keys NULL
key NULL
key_len NULL
ref NULL
rows 2633
Extra using where
This told me that it was a painful operation to make and since it was going to get called 22500 times (for the given set of data being deleted), that was the problem. Once I created an INDEX on that other_id column and reran the EXPLAIN, I got:
id 1
select_type SIMPLE
table c
type ref
possible_keys Index_1
key Index_1
key_len 8
ref const
rows 1
Extra
Much better, in fact really great.
I added that Index_1 and my delete times are in line with the times reported by mattkemp. This was a really subtle error on my part due to shoe-horning some additional functionality at the last minute. It turned out that most of the suggested alternative DELETE/SELECT statements, as Daniel stated, ended up taking essentially the same amount of time and as soulmerge mentioned, the statement was pretty much the best I was going to be able to construct based on what I needed to do. Once I provided an index for this other table C, my DELETEs were fast.
Postmortem:
Two lessons learned came out of this exercise. First, it is clear that I didn't leverage the power of the EXPLAIN statement to get a better idea of the impact of my SQL queries. That's a rookie mistake, so I'm not going to beat myself up about that one. I'll learn from that mistake. Second, the offending code was the result of a 'get it done quick' mentality and inadequate design/testing led to this problem not showing up sooner. Had I generated several sizable test data sets to use as test input for this new functionality, I'd have not wasted my time nor yours. My testing on the DB side was lacking the depth that my application side has in place. Now I've got the opportunity to improve that.
Thanks to all.