ansaurus

Question

Speed up MySQL query containing 300k+ records

Answer 1

+2 A:

Normalization of the data would go far in speeding up the queries, furthermore, if you are running on a slow machine this will adversely affect the speed of in which your results will be returned. Show me an example query to this table then I can better understand what your attempt is on that angle.

drlouie - louierd 2009-12-18 00:04:48

The first query on top is the one I'll be using almost exclusively. I may add a `WHERE` now and then, but otherwise ...

skerit 2009-12-18 00:10:02

Answer 2

+2 A:

The default values in my.cnf typically are set for systems with VERY little memory by today's standards. If you are using those default values, that may be the single best place to look for performance gains. Ensure you are allocating all of the memory you can spare to MySQL.

mysqltuner can make good starting recommendations for allocating memory between the various parts of MySQL that can use it.

If you created your indices before adding most of the data, you may see vast improvement by performing ANALYZE TABLE on your tables. I saw one query drop from 24 seconds to 1 second just by doing that.

Your EXPLAIN indicates that MySQL is doing a table scan to satisfy WHERE s2.sku IS NULL prior to narrowing the search. That's very expensive.

f1.date < f2.date
OR f1.date = f2.date

should be able to be re-written as

f1.date <= f2.date

though I doubt that matters to the optimizer.

Could you explain in plain English what you are trying to do with the query? That might help shed light on how it could be simplified.

Eric J. 2009-12-18 00:26:38

Every day a list, containing the stock quantity of our products, is added to the "stock" table. Because I wanted to deduplicate as much as possible I moved the date information (what date is this, from which file, ...) to another table, called stockfile.What I want now is all my products with their latest stock quantity. (And I have to use the date, not the stockfileid)

skerit 2009-12-18 00:42:31

Hmm... this table scan for null looks not only expensive but also quite meaningless for the outer join null condition. But I can't read mysql explains well.

Michael Krelin - hacker 2009-12-18 00:48:10

If you are loading a stock update for every single SKU, do you need the condition f1.date < f2.date? What is the purpose of WHERE s2.sku IS NULL?

Eric J. 2009-12-18 00:51:16

Eric, this query is easy to read - the purpose of null condition is to make sure there's no files for sku with a later date.

Michael Krelin - hacker 2009-12-18 00:55:51

I'm not sure that it should work as expected, though. I'd think the output of the second join should have an alias for this purpose, not the table inside the join.

Michael Krelin - hacker 2009-12-18 00:57:07

That scan does look intense, but I have no idea what to do about it. I don't believe I can add any more indexes.

skerit 2009-12-18 01:01:42

Answer 3

+4 A:

I'm not sure I got your query right, but if it's safe to suppose that maximal date has also a maximal stockfileid (like your OR condition half-suggests) maybe something like this query would be of help:

SELECT s1.*, f1.*
 FROM
  stock s1 JOIN stockfile f1 USING (stockfileid)
  JOIN (
   SELECT sku, max(date) AS maxdate, max(stockfileid) AS maxfileid
   FROM stock JOIN stockfile USING (stockfileid)
   GROUP BY sku
  ) AS dfi ON (s1.sku,f1.date,f1.stockfileid)=(dfi.sku,maxdate,maxfileid);

Not sure whether this is what you want and whether it's faster, but it should be. On the other hand, you don't need to take date into account at all, if fileid has it all. Anyway, I think this kind prefiltering may help as a starting point.

Michael Krelin - hacker 2009-12-18 00:27:05

Problem is I don't trust the stockfileid. It *is* possible that files get mixed up and arrive later. So an older file would get a newer stockfileid. Silly but possible.

skerit 2009-12-18 01:03:14

Well, it's possible to rewrite the ON condition then...

Michael Krelin - hacker 2009-12-18 01:04:53

umm.. or not.. ;-)

Michael Krelin - hacker 2009-12-18 01:06:07

Well look at that! That did it. The query runs in about a second!

skerit 2009-12-18 01:13:07

Btw: I can remove the maxfileid references, I still get the correct result.

skerit 2009-12-18 01:15:58

Not sure if you get it reliably, though. You might have added another level of filtering, which then would probably add another second, but I'm too tired to try to do it now and going to sleep. Happy hacking! ;-)

Michael Krelin - hacker 2009-12-18 01:17:55

(the note on reliability was about removing fileid)

Michael Krelin - hacker 2009-12-18 01:18:29

Answer 4

+2 A:

I'm not sure if this is something you could do with your app, but instead of computing the quantity for each sku every single time you run the query, it would be more efficient to store the sku and quantity in a separate table and then just update the data whenever a new stockfile is received. That way you incur the cost of calculating this once per scorefile and not once per query. It's a bit of an upfront cost to compute this but it saves you a lot down the line.

TskTsk 2009-12-18 00:41:28

ansaurus

tags:

views:

answers:

Speed up MySQL query containing 300k+ records

related questions