views:

77

answers:

3

I'm wondering if someone can explain how the IN calculates? Well, ultimately I'm trying to find out why this query is slow and how to optimize it. I waited over 3 minutes and when I cancelled the query it had only returned 1000 lines which doesn't seem like it should take that long.

SELECT t2.* 
FROM report_tables.roc_test_results as t2 
WHERE t2.job IN (SELECT DISTINCT(t1.job) 
                   FROM report_tables.roc_test_results as t1 
                  WHERE t1.operation = 'TEST' 
                    AND result = 'Passed' 
                    AND STR_TO_DATE(t1.date_created,'%d-%M-%Y') BETWEEN '2009-10-01' 
                                                                    AND '2009-10-31')

I'm not sure what the total query should return, if I had to guess I would say around 2000 records, the subquery returns 332 (336 when not Distinct).

Can anyone give me some pointers on how to optimize this query? Also, I'm wondering, does the subquery calculate every time or just once and store it?

As requested, the results for DESC... (by the way, please don't laugh, I am self taught so I'm sure this table is hideously designed.)

Field                     Type               Null     Key    Default    Extra
------                      -----              -----     ---    -------    -----
operation                 varchar(10)         NO   
tester                 varchar(25)            NO   
result                 varchar(45)            NO   
fail_mode              varchar(45)         NO   
primary_failure        varchar(25)           NO   
ref_des                varchar(45)           NO   
rf_hours               varchar(15)          NO   
ac_hours               varchar(15)          NO   
comments               text              NO   
job                    varchar(15)           NO   
rma                    bigint(20) unsigned    NO   
item                   varchar(45)          NO   
item_description       text                  NO   
serial                 varchar(25)            NO   
created_by             varchar(25)            NO   
collection             bigint(20) unsigned    NO    PRI  
date_created           varchar(15)          NO   
A: 

first of all you don't need the distinct in the subquery since IN eliminates duplicates anyhow Do you need the function call in the WHERE clause and do you have and index on the date_created column?

what happens when you change

WHERE STR_TO_DATE(t1.date_created,'%d-%M-%Y') 
BETWEEN '2009-10-01' AND '2009-10-31')

to

WHERE 1.date_created >= '2009-10-01' 
AND 1.date_created < '2010-01-01'

Sometimes indexes won't be used if you use functions on the column

SQLMenace
Yes, I need the where clause, basically it's pulling all of the RMAs that shipped for that period, then I'm gathering all historical data related to the list of RMAs that shipped.I do have an index but it's basically an auto incremented index, how would I incorporate that. Sorry, I am self taught so I'm not too savvy with the details. I read the manual on indexes but still didn't see how I could incorporate an index that didn't relate to the data.
Geoff
I asked if you needed the function not the WHERE clause. The WHERE clause I have should be the same one as you have but should run much faster
SQLMenace
STR_TO_DATE converts a string to a date, implying that `date_created` is a VARCHAR/TEXT/etc data type so the function would be necessary.
OMG Ponies
oh, sorry, yes I do need it. I didn't know how to use SET when using LOAD DATA INFILE so the date looks like 01-Oct-09. Unless there is a way to still use that format without the function.
Geoff
Also, probably worth mentioning, I have applications that use this data so I can't really change it because it will mess up the applications that are assuming it is in the D-Mon-Year format
Geoff
A: 

My advice is to replace the IN with a JOIN, and then consider adding indexes on some of your columns, such as job, and maybe operation and/or result. You should read up on indexes in the MySQL manual, and also on using EXPLAIN to optimize your queries:

http://dev.mysql.com/doc/refman/5.1/en/indexes.html

http://dev.mysql.com/doc/refman/5.1/en/using-explain.html

Here's an example of converting the IN to a JOIN:

SELECT distinct t2.* 
FROM roc_test_results as t2
inner join roc_test_results as t1 on t1.job = t2.job
WHERE t1.operation = 'TEST' 
AND t1.result = 'Passed' 
AND STR_TO_DATE(t1.date_created,'%d-%M-%Y') BETWEEN '2009-10-01' AND '2009-10-31';
Ike Walker
STR_TO_DATE converts a string to a date, implying that `date_created` is a VARCHAR/TEXT/etc data type so the function would be necessary. An index on the column won't be used because of converting to a different data type.
OMG Ponies
Forgive me if this sounds like dumb question but, if the date isn't the left most prefix of the index and I'm not including the primary index how will that help.Also, I'm not sure your second paragraph will return what I'm looking for. I first gather a list of RMAs that have passed TEST in the date range, then I need all rows from any date range that pertain to the RMAs that passed TEST in the date range.
Geoff
@OMG Ponies: You are correct. I didn't look closely enough at the function. I just assumed it was converting a DATETIME to a DATE. So the index won't help.
Ike Walker
@Geoff: The index won't help. I didn't look closely enough at your query. See my previous comment. As for the way I rewrote the query, again I didn't look closely enough at the original, so you are correct that it won't give you what you want. I'll rewrite it with a join.
Ike Walker
+1  A: 

The date_created data type needs to change to be a DATETIME before it's worth defining an index on the column. The reason being, the index will be worthless if you are changing the data type from string to DATETIME as you are currently.

You've mentioned that you're using LOAD DATA INFILE, and that the source file contains dates in DD-MON-YY format. MySQL will implicitly convert strings into DATETIME if the YY-MM-DD format is used, so if you can correct this in your source file before using LOAD DATA INFILE the rest should fall in to place.

After that, a covering index using:

  • job
  • operation
  • result
  • date_created

...would be a good idea.

OMG Ponies
Correct, I know now that I can change the format during the LOAD DATA INFILE using @ and SET but when I added the original data I did not. I guess since I already have applications that depend on it's format I'm kind of stuck. I either have to leave it the way it is and wait forever for the results while also having a horribly indexed table or I have to go through my applications and change all of the queries by removing the STR_TO_DATE since it won't be needed anymore (I'll probably do the latter). This Covering index looks very interesting, thanks for pointing that out.
Geoff