I have a couple of tables (products and suppliers) and want to find out which items are no longer listed in the suppliers table.
Table uc_products has the products. Table uc_supplier_csv has supplier stocks. uc_products.model joins against uc_suppliers.sku.
I am seeing very long queries when trying to identify the stock in the products table which are not referred to in the suppliers table. I only want to extract the nid of the entries which match; sid IS NULL is just so I can identify which items don't have a supplier.
For the first of the queries below, it takes the DB server (4GB ram / 2x 2.4GHz intel) an hour to get a result (507 rows). I didn't wait for the second query to finish.
How can I make this query more optimal? Is it due to the mismatched character sets?
I was thinking that the following would be the most efficient SQL to use:
SELECT nid, sid
FROM uc_products p
LEFT OUTER JOIN uc_supplier_csv c
ON p.model = c.sku
WHERE sid IS NULL ;
For this query, I get the following EXPLAIN result:
mysql> EXPLAIN SELECT nid, sid FROM uc_products p LEFT OUTER JOIN uc_supplier_csv c ON p.model = c.sku WHERE sid IS NULL;
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------------------+
| 1 | SIMPLE | p | ALL | NULL | NULL | NULL | NULL | 6526 | |
| 1 | SIMPLE | c | ALL | NULL | NULL | NULL | NULL | 126639 | Using where; Not exists |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------------------+
2 rows in set (0.00 sec)
I would have thought that the keys idx_sku and idx_model would be valid for use here, but they aren't. Is that because the tables' default charsets do not match? One is UTF-8 and one is latin1.
I also considered this form:
SELECT nid
FROM uc_products
WHERE model
NOT IN (
SELECT DISTINCT sku FROM uc_supplier_csv
) ;
EXPLAIN shows the following results for that query:
mysql> explain select nid from uc_products where model not in ( select sku from uc_supplier_csv ) ;
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
| 1 | PRIMARY | uc_products | ALL | NULL | NULL | NULL | NULL | 6520 | Using where |
| 2 | DEPENDENT SUBQUERY | uc_supplier_csv | index | idx_sku,idx_sku_stock | idx_sku | 258 | NULL | 126639 | Using where; Using index |
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
2 rows in set (0.00 sec)
And just so I don't miss anything out, here are a few more exciting details: the table sizes and stats, and the table structure :)
mysql> show table status where Name in ( 'uc_supplier_csv', 'uc_products' ) ;
+-----------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+-----------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+
| uc_products | MyISAM | 10 | Dynamic | 6520 | 89 | 585796 | 281474976710655 | 232448 | 912 | NULL | 2009-04-24 11:03:15 | 2009-10-12 14:23:43 | 2009-04-24 11:03:16 | utf8_general_ci | NULL | | |
| uc_supplier_csv | MyISAM | 10 | Dynamic | 126639 | 26 | 3399704 | 281474976710655 | 5864448 | 0 | NULL | 2009-10-12 14:28:25 | 2009-10-12 14:28:25 | 2009-10-12 14:28:27 | latin1_swedish_ci | NULL | | |
+-----------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+
and
CREATE TABLE `uc_products` (
`vid` mediumint(9) NOT NULL default '0',
`nid` mediumint(9) NOT NULL default '0',
`model` varchar(255) NOT NULL default '',
`list_price` decimal(10,2) NOT NULL default '0.00',
`cost` decimal(10,2) NOT NULL default '0.00',
`sell_price` decimal(10,2) NOT NULL default '0.00',
`weight` float NOT NULL default '0',
`weight_units` varchar(255) NOT NULL default 'lb',
`length` float unsigned NOT NULL default '0',
`width` float unsigned NOT NULL default '0',
`height` float unsigned NOT NULL default '0',
`length_units` varchar(255) NOT NULL default 'in',
`pkg_qty` smallint(5) unsigned NOT NULL default '1',
`default_qty` smallint(5) unsigned NOT NULL default '1',
`unique_hash` varchar(32) NOT NULL,
`ordering` tinyint(2) NOT NULL default '0',
`shippable` tinyint(2) NOT NULL default '1',
PRIMARY KEY (`vid`),
KEY `idx_model` (`model`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
CREATE TABLE `uc_supplier_csv` (
`sid` int(10) unsigned NOT NULL default '0',
`sku` varchar(255) default NULL,
`stock` int(10) unsigned NOT NULL default '0',
`list_price` decimal(8,2) default '0.00',
KEY `idx_sku` (`sku`),
KEY `idx_stock` (`stock`),
KEY `idx_sku_stock` (`sku`,`stock`),
KEY `idx_sid` (`sid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
EDIT: Adding query plans for a couple of suggested queries from Martin below:
mysql> explain SELECT nid FROM uc_products p WHERE NOT EXISTS ( SELECT 1 FROM uc_supplier_csv c WHERE p.model = c.sku ) ;
+----+--------------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
| 1 | PRIMARY | p | ALL | NULL | NULL | NULL | NULL | 6526 | Using where |
| 2 | DEPENDENT SUBQUERY | c | index | NULL | idx_sku | 258 | NULL | 126639 | Using where; Using index |
+----+--------------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
2 rows in set (0.00 sec)
mysql> explain SELECT nid FROM uc_products WHERE model NOT IN ( SELECT sku FROM uc_supplier_csv ) ;
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
| 1 | PRIMARY | uc_products | ALL | NULL | NULL | NULL | NULL | 6526 | Using where |
| 2 | DEPENDENT SUBQUERY | uc_supplier_csv | index | idx_sku,idx_sku_stock | idx_sku | 258 | NULL | 126639 | Using where; Using index |
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
2 rows in set (0.00 sec)