views:

221

answers:

1

I have a couple of tables (products and suppliers) and want to find out which items are no longer listed in the suppliers table.

Table uc_products has the products. Table uc_supplier_csv has supplier stocks. uc_products.model joins against uc_suppliers.sku.

I am seeing very long queries when trying to identify the stock in the products table which are not referred to in the suppliers table. I only want to extract the nid of the entries which match; sid IS NULL is just so I can identify which items don't have a supplier.

For the first of the queries below, it takes the DB server (4GB ram / 2x 2.4GHz intel) an hour to get a result (507 rows). I didn't wait for the second query to finish.

How can I make this query more optimal? Is it due to the mismatched character sets?

I was thinking that the following would be the most efficient SQL to use:

         SELECT nid, sid 
           FROM uc_products p
LEFT OUTER JOIN uc_supplier_csv c
             ON p.model = c.sku
         WHERE sid IS NULL ;

For this query, I get the following EXPLAIN result:

mysql> EXPLAIN SELECT nid, sid FROM uc_products p LEFT OUTER JOIN uc_supplier_csv c ON p.model = c.sku WHERE sid IS NULL;
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra                   |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------------------+
|  1 | SIMPLE      | p     | ALL  | NULL          | NULL | NULL    | NULL |   6526 |                         | 
|  1 | SIMPLE      | c     | ALL  | NULL          | NULL | NULL    | NULL | 126639 | Using where; Not exists | 
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------------------+
2 rows in set (0.00 sec)

I would have thought that the keys idx_sku and idx_model would be valid for use here, but they aren't. Is that because the tables' default charsets do not match? One is UTF-8 and one is latin1.

I also considered this form:

SELECT nid 
  FROM uc_products 
 WHERE model 
NOT IN ( 
         SELECT DISTINCT sku FROM uc_supplier_csv 
       ) ;

EXPLAIN shows the following results for that query:

mysql> explain select nid from uc_products where model not in ( select sku from uc_supplier_csv ) ;
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
| id | select_type        | table           | type  | possible_keys         | key     | key_len | ref  | rows   | Extra                    |
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
|  1 | PRIMARY            | uc_products     | ALL   | NULL                  | NULL    | NULL    | NULL |   6520 | Using where              | 
|  2 | DEPENDENT SUBQUERY | uc_supplier_csv | index | idx_sku,idx_sku_stock | idx_sku | 258     | NULL | 126639 | Using where; Using index | 
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
2 rows in set (0.00 sec)

And just so I don't miss anything out, here are a few more exciting details: the table sizes and stats, and the table structure :)

mysql> show table status where Name in ( 'uc_supplier_csv', 'uc_products' ) ;
+-----------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+
| Name            | Engine | Version | Row_format | Rows   | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time          | Collation         | Checksum | Create_options | Comment |
+-----------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+
| uc_products     | MyISAM |      10 | Dynamic    |   6520 |             89 |      585796 | 281474976710655 |       232448 |       912 |           NULL | 2009-04-24 11:03:15 | 2009-10-12 14:23:43 | 2009-04-24 11:03:16 | utf8_general_ci   |     NULL |                |         | 
| uc_supplier_csv | MyISAM |      10 | Dynamic    | 126639 |             26 |     3399704 | 281474976710655 |      5864448 |         0 |           NULL | 2009-10-12 14:28:25 | 2009-10-12 14:28:25 | 2009-10-12 14:28:27 | latin1_swedish_ci |     NULL |                |         | 
+-----------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+

and

CREATE TABLE `uc_products` (
  `vid` mediumint(9) NOT NULL default '0',
  `nid` mediumint(9) NOT NULL default '0',
  `model` varchar(255) NOT NULL default '',
  `list_price` decimal(10,2) NOT NULL default '0.00',
  `cost` decimal(10,2) NOT NULL default '0.00',
  `sell_price` decimal(10,2) NOT NULL default '0.00',
  `weight` float NOT NULL default '0',
  `weight_units` varchar(255) NOT NULL default 'lb',
  `length` float unsigned NOT NULL default '0',
  `width` float unsigned NOT NULL default '0',
  `height` float unsigned NOT NULL default '0',
  `length_units` varchar(255) NOT NULL default 'in',
  `pkg_qty` smallint(5) unsigned NOT NULL default '1',
  `default_qty` smallint(5) unsigned NOT NULL default '1',
  `unique_hash` varchar(32) NOT NULL,
  `ordering` tinyint(2) NOT NULL default '0',
  `shippable` tinyint(2) NOT NULL default '1',
  PRIMARY KEY  (`vid`),
  KEY `idx_model` (`model`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 

CREATE TABLE `uc_supplier_csv` (
  `sid` int(10) unsigned NOT NULL default '0',
  `sku` varchar(255) default NULL,
  `stock` int(10) unsigned NOT NULL default '0',
  `list_price` decimal(8,2) default '0.00',
  KEY `idx_sku` (`sku`),
  KEY `idx_stock` (`stock`),
  KEY `idx_sku_stock` (`sku`,`stock`),
  KEY `idx_sid` (`sid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

EDIT: Adding query plans for a couple of suggested queries from Martin below:

mysql> explain SELECT nid FROM uc_products p WHERE NOT EXISTS ( SELECT 1 FROM uc_supplier_csv c WHERE p.model = c.sku ) ;
+----+--------------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type        | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+----+--------------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
|  1 | PRIMARY            | p     | ALL   | NULL          | NULL    | NULL    | NULL |   6526 | Using where              | 
|  2 | DEPENDENT SUBQUERY | c     | index | NULL          | idx_sku | 258     | NULL | 126639 | Using where; Using index | 
+----+--------------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+
2 rows in set (0.00 sec)

mysql> explain SELECT nid FROM uc_products WHERE model NOT IN ( SELECT sku  FROM uc_supplier_csv ) ;
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
| id | select_type        | table           | type  | possible_keys         | key     | key_len | ref  | rows   | Extra                    |
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
|  1 | PRIMARY            | uc_products     | ALL   | NULL                  | NULL    | NULL    | NULL |   6526 | Using where              | 
|  2 | DEPENDENT SUBQUERY | uc_supplier_csv | index | idx_sku,idx_sku_stock | idx_sku | 258     | NULL | 126639 | Using where; Using index | 
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+
2 rows in set (0.00 sec)
+4  A: 

Perhaps try using NOT EXISTS rather than counts? For example:

SELECT nid 
  FROM uc_products p
 WHERE NOT EXISTS ( 
       SELECT 1 
         FROM uc_supplier_csv c
        WHERE p.model = c.sku
       )

SO user Quassnoi has a short article outlining some tests that suggest that this might also be worth a try:

SELECT nid 
  FROM uc_products
 WHERE model NOT IN ( 
       SELECT sku 
       FROM uc_supplier_csv
       )

basically as per your original query, without the DISTINCTion.

Another one for you Chris, this time with help for the cross-encoding join:

SELECT nid
  FROM uc_products p
 WHERE NOT EXISTS (
       SELECT 1
       FROM uc_supplier_csv c
       WHERE CONVERT( p.model USING latin1 )  = c.sku
       )
martin clayton
This query was the fastest suggested solution to return the correct result. It took 637s to execute.
Chris Burgess
What did the query plan look like?
martin clayton
added to question (formatting isn't working for me in the comments?)
Chris Burgess
Third time lucky? See above.
martin clayton
Wow. 0.17s instead of ten minutes. Nice. The third example (using WHERE CONVERT() above) gives the right answer almost instantly. Is it about the CONVERT you think? I had updated my local table to be DEFAULT CHARSET UTF8 but didn't see any change in behaviour from doing that.
Chris Burgess
The convert helps the optimiser when there is a 'cross encoding' needed. Best if you can match encodings in the join fields though. If you applied 'ALTER TABLE uc_supplier_csv DEFAULT CHARACTER SET utf8' you should see (using SHOW CREATE TABLE) that the existing sku field data has not been converted - so the optimiser still needs help. If you can apply 'ALTER TABLE uc_supplier_csv CONVERT TO CHARACTER SET utf8' that should migrate the data over, and the convert should no longer be needed.
martin clayton