views:

51

answers:

1
+1  Q: 

optimizing query

I have a Table foo which records the sightings of bird species. foo_id is its PK, other concerned columns are s_date, latitude and longitude. species_id is its FK. I have indexes on s_date, latitude and longitude, species_id. Table foo has 20 million records and increasing. The following query gives me top 10 latest species sightings in a given lat/long. The query is taking too much time (10+ mins sometimes). How to optimize it? I am using mysql.

SELECT species_id, max(s_date) 
FROM foo 
WHERE latitude >= minlat 
    AND latitude <= maxlat 
    AND longitude >= minlon 
    AND longitude <= max lon 
GROUP BY species_id 
ORDER BY MAX(s_date) DESC LIMIT 0, 10;
A: 

I understand that you have separate indexes on the fields that you mention. You may want to try adding a composite index (aka multiple-column index) on (latitude, longitude):

CREATE INDEX ix_foo_lat_lng ON foo (latitude, longitude);

You may want to run an EXPLAIN on your query to see what index(es) MySQL is using. Quoting from the MySQL Manual :: How MySQL Uses Indexes:

Suppose that you issue the following SELECT statement:

mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;

If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimizer will attempt to use the Index Merge optimization, or attempt to find the most restrictive index by deciding which index finds fewer rows and using that index to fetch the rows.

You may also be interested in checking out the following presentation:

The author describes how you can use the Haversine Formula in MySQL to order by proximity and limit your searches to a defined range. He also describes how to avoid a full table scan for such queries, using traditional indexes on the latitude and longitude columns.


1 PDF Version

Daniel Vassallo
thanks for the answer. any suggestions for optimizing the group by, order by part?
androidharry
@androidharry: If the composite index on `(latitude, longitude)` works, and restricts the result set to just a few number of rows, the `GROUP BY` should be automatically pretty fast. Right now it's slow because (seeing your comment above) your query is just using the `longitude` index, so the intermediate result set is very big.
Daniel Vassallo
I am already using something similar as shown in the presentation. I found the formula from http://www.movable-type.co.uk/scripts/latlong-db.html. It is using earth's radius for the calculation while in the presentation 69 miles is being used. I was wondering which one is correct?
androidharry
will partitioning help?
androidharry
@androidharry: I don't think it will. Are you still having performance problems?
Daniel Vassallo
I have partitioned the table and now i am using "my_date > start_date AND my_date < end_date" in the where clause to narrow down things. It has improved the response time. And do you have any idea about that earth's radius/69 miles thing?
androidharry