views:

218

answers:

4

I need to lookup all my products (sku's) their latest stock quantity.
I have one table (called "stock") with 315k+ records containing this information (a new batch of data is added every day, for most sku's). The reference data is in another table (called "stockfile").

This is the query to do it:

SELECT s1 . * , f1 . *
FROM stock s1
JOIN stockfile f1 ON ( s1.stockfileid = f1.stockfileid )
LEFT OUTER JOIN ( stock s2
JOIN stockfile f2 ON ( s2.stockfileid = f2.stockfileid )
) ON ( s1.sku = s2.sku
AND ( f1.date < f2.date
OR f1.date = f2.date
AND f1.stockfileid < f2.stockfileid) )
WHERE s2.sku IS NULL

These are the table definitions

SHOW CREATE TABLE stock:

CREATE TABLE `stock` (
 `stockid` bigint(20) NOT NULL AUTO_INCREMENT,
 `sku` char(25) NOT NULL,
 `quantity` int(5) NOT NULL,
 `creationdate` datetime NOT NULL,
 `stockfileid` smallint(5) unsigned NOT NULL,
 `touchdate` datetime NOT NULL,
 PRIMARY KEY (`stockid`),
 KEY `stock_sku` (`sku`),
 KEY `stock_stockfileid` (`stockfileid`)
) ENGINE=MyISAM AUTO_INCREMENT=316039 DEFAULT CHARSET=latin1

SHOW CREATE TABLE stockfile:

CREATE TABLE `stockfile` (
 `stockfileid` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
 `filename` varchar(25) NOT NULL,
 `creationdate` datetime DEFAULT NULL,
 `touchdate` datetime DEFAULT NULL,
 `date` datetime DEFAULT NULL,
 `begindate` datetime DEFAULT NULL,
 `enddate` datetime DEFAULT NULL,
 PRIMARY KEY (`stockfileid`),
 KEY `stockfile_date` (`date`)
) ENGINE=MyISAM AUTO_INCREMENT=266 DEFAULT CHARSET=latin1

Without any extra indexes it takes... forever. I added these and it sped up to about 250 seconds:

CREATE INDEX stock_sku ON stock(sku);
CREATE INDEX stock_stockfileid ON stock(stockfileid);
CREATE INDEX stockfile_date ON stockfile(date);

This is the EXPLAIN on the original query, with these indexes.

id  select_type  table   type    possible_keys               key        key_len  ref                     rows   Extra
1   SIMPLE       s1      ALL    stock_stockfileid           NULL       NULL     NULL                    316038
1   SIMPLE       f1      eq_ref  PRIMARY                     PRIMARY    2        kompare.s1.stockfileid  1     
1   SIMPLE       s2      ref    stock_sku,stock_stockfileid stock_sku  25       kompare.s1.sku          12     Using where
1   SIMPLE       f2      eq_ref  PRIMARY,stockfile_date      PRIMARY    2        kompare.s2.stockfileid  1

Is there another way to speed things up?

  • Thanks to Bill Karwin for solving the original query!
+2  A: 

Normalization of the data would go far in speeding up the queries, furthermore, if you are running on a slow machine this will adversely affect the speed of in which your results will be returned. Show me an example query to this table then I can better understand what your attempt is on that angle.

drlouie - louierd
The first query on top is the one I'll be using almost exclusively. I may add a `WHERE` now and then, but otherwise ...
skerit
+2  A: 

The default values in my.cnf typically are set for systems with VERY little memory by today's standards. If you are using those default values, that may be the single best place to look for performance gains. Ensure you are allocating all of the memory you can spare to MySQL.

mysqltuner can make good starting recommendations for allocating memory between the various parts of MySQL that can use it.

If you created your indices before adding most of the data, you may see vast improvement by performing ANALYZE TABLE on your tables. I saw one query drop from 24 seconds to 1 second just by doing that.

Your EXPLAIN indicates that MySQL is doing a table scan to satisfy WHERE s2.sku IS NULL prior to narrowing the search. That's very expensive.

f1.date < f2.date
OR f1.date = f2.date

should be able to be re-written as

f1.date <= f2.date

though I doubt that matters to the optimizer.

Could you explain in plain English what you are trying to do with the query? That might help shed light on how it could be simplified.

Eric J.
Every day a list, containing the stock quantity of our products, is added to the "stock" table. Because I wanted to deduplicate as much as possible I moved the date information (what date is this, from which file, ...) to another table, called stockfile.What I want now is all my products with their latest stock quantity. (And I have to use the date, not the stockfileid)
skerit
Hmm... this table scan for null looks not only expensive but also quite meaningless for the outer join null condition. But I can't read mysql explains well.
Michael Krelin - hacker
If you are loading a stock update for every single SKU, do you need the condition f1.date < f2.date? What is the purpose of WHERE s2.sku IS NULL?
Eric J.
Eric, this query is easy to read - the purpose of null condition is to make sure there's no files for sku with a later date.
Michael Krelin - hacker
I'm not sure that it should work as expected, though. I'd think the output of the second join should have an alias for this purpose, not the table inside the join.
Michael Krelin - hacker
That scan does look intense, but I have no idea what to do about it. I don't believe I can add any more indexes.
skerit
+4  A: 

I'm not sure I got your query right, but if it's safe to suppose that maximal date has also a maximal stockfileid (like your OR condition half-suggests) maybe something like this query would be of help:

SELECT s1.*, f1.*
 FROM
  stock s1 JOIN stockfile f1 USING (stockfileid)
  JOIN (
   SELECT sku, max(date) AS maxdate, max(stockfileid) AS maxfileid
   FROM stock JOIN stockfile USING (stockfileid)
   GROUP BY sku
  ) AS dfi ON (s1.sku,f1.date,f1.stockfileid)=(dfi.sku,maxdate,maxfileid);

Not sure whether this is what you want and whether it's faster, but it should be. On the other hand, you don't need to take date into account at all, if fileid has it all. Anyway, I think this kind prefiltering may help as a starting point.

Michael Krelin - hacker
Problem is I don't trust the stockfileid. It *is* possible that files get mixed up and arrive later. So an older file would get a newer stockfileid. Silly but possible.
skerit
Well, it's possible to rewrite the ON condition then...
Michael Krelin - hacker
umm.. or not.. ;-)
Michael Krelin - hacker
Well look at that! That did it. The query runs in about a second!
skerit
Btw: I can remove the maxfileid references, I still get the correct result.
skerit
Not sure if you get it reliably, though. You might have added another level of filtering, which then would probably add another second, but I'm too tired to try to do it now and going to sleep. Happy hacking! ;-)
Michael Krelin - hacker
(the note on reliability was about removing fileid)
Michael Krelin - hacker
+2  A: 

I'm not sure if this is something you could do with your app, but instead of computing the quantity for each sku every single time you run the query, it would be more efficient to store the sku and quantity in a separate table and then just update the data whenever a new stockfile is received. That way you incur the cost of calculating this once per scorefile and not once per query. It's a bit of an upfront cost to compute this but it saves you a lot down the line.

TskTsk