views:

59

answers:

3

Description

According to the explain command, there is a range that is causing a query to perform a full table scan (160k rows). How do I keep the range condition and reduce the scanning? I expect the culprit to be:

Y.YEAR BETWEEN 1900 AND 2009 AND

Code

Here is the code that has the range condition (the STATION_DISTRICT is likely superfluous).

SELECT                                                                  
  COUNT(1) as MEASUREMENTS,                                             
  AVG(D.AMOUNT) as AMOUNT,                                              
  Y.YEAR as YEAR,                                                       
  MAKEDATE(Y.YEAR,1) as AMOUNT_DATE                                     
FROM                                                                    
  CITY C,                                                               
  STATION S,                                                            
  STATION_DISTRICT SD,                                                  
  YEAR_REF Y FORCE INDEX(YEAR_IDX),                                     
  MONTH_REF M,                                                          
  DAILY D                                                               
WHERE                                                                   
  -- For a specific city ...                                            
  --                                                                    
  C.ID = 10663 AND                                                      

  -- Find all the stations within a specific unit radius ...
  --                                                        
  6371.009 *                                                
  SQRT(                                                     
    POW(RADIANS(C.LATITUDE_DECIMAL - S.LATITUDE_DECIMAL), 2) +
    (COS(RADIANS(C.LATITUDE_DECIMAL + S.LATITUDE_DECIMAL) / 2) *
     POW(RADIANS(C.LONGITUDE_DECIMAL - S.LONGITUDE_DECIMAL), 2)) ) <= 50 AND

  -- Get the station district identification for the matching station.
  --                                                                  
  S.STATION_DISTRICT_ID = SD.ID AND                                   

  -- Gather all known years for that station ...
  --                                            
  Y.STATION_DISTRICT_ID = SD.ID AND             

  -- The data before 1900 is shaky; insufficient after 2009.
  --                                                        
  Y.YEAR BETWEEN 1900 AND 2009 AND                          

  -- Filtered by all known months ...
  --                                 
  M.YEAR_REF_ID = Y.ID AND           

  -- Whittled down by category ...
  --
  M.CATEGORY_ID = '003' AND

  -- Into the valid daily climate data.
  --
  M.ID = D.MONTH_REF_ID AND
  D.DAILY_FLAG_ID <> 'M'
GROUP BY
  Y.YEAR

Update

The SQL is performing a full table scan, which results in MySQL performing a "copy to tmp table", as shown here:

+----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+
| id | select_type | table | type   | possible_keys                     | key          | key_len | ref                           | rows   | Extra       |
+----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+
|  1 | SIMPLE      | C     | const  | PRIMARY                           | PRIMARY      | 4       | const                         |      1 |             |
|  1 | SIMPLE      | Y     | range  | YEAR_IDX                          | YEAR_IDX     | 4       | NULL                          | 160422 | Using where |
|  1 | SIMPLE      | SD    | eq_ref | PRIMARY                           | PRIMARY      | 4       | climate.Y.STATION_DISTRICT_ID |      1 | Using index |
|  1 | SIMPLE      | S     | eq_ref | PRIMARY                           | PRIMARY      | 4       | climate.SD.ID                 |      1 | Using where |
|  1 | SIMPLE      | M     | ref    | PRIMARY,YEAR_REF_IDX,CATEGORY_IDX | YEAR_REF_IDX | 8       | climate.Y.ID                  |     54 | Using where |
|  1 | SIMPLE      | D     | ref    | INDEX                             | INDEX        | 8       | climate.M.ID                  |     11 | Using where |
+----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+

Answer

After using the STRAIGHT_JOIN:

+----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+
| id | select_type | table | type   | possible_keys                     | key           | key_len | ref                           | rows | Extra                           |
+----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+
|  1 | SIMPLE      | C     | const  | PRIMARY                           | PRIMARY       | 4       | const                         |    1 | Using temporary; Using filesort |
|  1 | SIMPLE      | S     | ALL    | PRIMARY                           | NULL          | NULL    | NULL                          | 7795 | Using where                     |
|  1 | SIMPLE      | SD    | eq_ref | PRIMARY                           | PRIMARY       | 4       | climate.S.STATION_DISTRICT_ID |    1 | Using index                     |
|  1 | SIMPLE      | Y     | ref    | PRIMARY,STAT_YEAR_IDX             | STAT_YEAR_IDX | 4       | climate.S.STATION_DISTRICT_ID | 1650 | Using where                     |
|  1 | SIMPLE      | M     | ref    | PRIMARY,YEAR_REF_IDX,CATEGORY_IDX | YEAR_REF_IDX  | 8       | climate.Y.ID                  |   54 | Using where                     |
|  1 | SIMPLE      | D     | ref    | INDEX                             | INDEX         | 8       | climate.M.ID                  |   11 | Using where                     |
+----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+

Related

Thank you!

+2  A: 

ONE Request... It looks like you KNOW your data. Add the keyword "STRAIGHT_JOIN" and see the results...

SELECT STRAIGHT_JOIN ... the rest of your query...

Straight-join tells MySql to DO IT AS I HAVE LISTED. So, your CITY table is the first in the FROM list, thus indicating you expect that to be your primary... Additionally, your WHERE clause of the CITY is the immediate filter. With that being said, it will probably fly through the rest of the query...

Hope it helps... Its worked for me with gov't data of millions of records queried and joined to 10+ lookup tables where mySql was trying to think for me.

DRapp
@DRapp: I'm surprised that MySQL optimized it incorrectly. It helped. Queries now take about half the time. Still need to optimize the MySQL server, but at least I don't need to worry about massive full table scans anymore.
Dave Jarvis
A: 

in order to do efficient between queries you are going to want a b tree index on your YEAR column. for example:

CREATE INDEX id_index USING BTREE ON YEAR_REF (YEAR);

BTREE indexes allow for efficient range queries, if this is in fact the root problem then having an index like this should get rid of the full table scan and have it only scan the part of the table that is in the range. read more about btrees on wikipedia

However, as with any optimisation advice, you should measure to make sure that you don't do more harm than good.

luke
Thanks, Luke. All the tables have indexes. Some of them have indexes on indexes, because I'm so paranoid. ;-) (Just kidding!)
Dave Jarvis
A: 

Can you change from searching within a radius to search in a bounding box?

You know the city so you can calculate a bounding box in your application.

Perhaps this

S.LATITUDE_DECIMAL >= latitude_lower and  
S.LATITUDE_DECIMAL <= latitude_upper and
S.LONGITUDE_DECIMAL >= longitude_lower and 
S.LONGITUDE_DECIMAL <= longitude_upper

could be a little faster?

ceteras
Was thinking about doing just that, but I do like circles. :-)
Dave Jarvis