views:

141

answers:

4

I'm no database expert, but I have enough knowledge to get myself into trouble, as is the case here. This query

SELECT DISTINCT p.* 
  FROM points p, areas a, contacts c 
 WHERE (    p.latitude > 43.6511659465 
        AND p.latitude < 43.6711659465 
        AND p.longitude > -79.4677941889 
        AND p.longitude < -79.4477941889) 
   AND p.resource_type = 'Contact' 
   AND c.user_id = 6

is extremely slow. The points table has fewer than 2000 records, but it takes about 8 seconds to execute. There are indexes on the latitude and longitude columns. Removing the clause concering the resource_type and user_id make no difference.

The latitude and longitude fields are both formatted as number(15,10) -- I need the precision for some calculations.

There are many, many other queries in this project where points are compared, but no execution time problems. What's going on?

+10  A: 

Did you forget something from your actual query? It's missing ANSI-89 joins between the three tables, giving you a cartesian product but only pulling out the POINTS records.

OMG Ponies
Not only that but `AND c.user_id=6` is doing nothing, since no results from `contacts` are returned.
VeeArr
@user315975: I dunno your data, but don't include tables if they serve absolutely no purpose. Still need to know how `POINTS` and `CONTACTS` relate...
OMG Ponies
@user315975: It'll probably be more worthwhile to analyze the performance of a query that makes sense.
Thanatos
Yep! Whats happening is he is getting all the possuble permutations of the points, areas and contacts rows, which are then being sorted to remove the duplicates as directed by the "DISTINCT" clause.
James Anderson
Not enough sleep, and some bad thinking on my part. Made the assumption that naming the tables would not generate the joins unless they were included in the clauses that relate them. Problem solved. Will post completed code if any are interested.
+5  A: 

You're joining three tables, p, a, and c, but you aren't specifying how to attach them together. What you're getting is a full Cartesian join between all of the rows in all of the tables that match the criteria, then everything in areas.

You probably want to attach something in points to something in areas. And something in contacts with ... well, I don't know what your schema looks like.

Try sticking an "EXPLAIN" at the beginning for information on what's happening.

Charles
Indeed. You might only have 2000 records in points, but if you have 2000 in areas and 2000 in contacts as well, you're generating 2000 * 2000 * 2000 = 8 billion rows, then sorting them back into distinct.
Cowan
+2  A: 

Probably you are missing the joins. Joining the table would be something like this.

SELECT DISTINCT p.* 
  FROM points p
  JOIN areas a p ON  a.FkPoint = p.id
  JOIN contacts c ON c.FkArea = a.id
 WHERE (    p.latitude > 43.6511659465 
        AND p.latitude < 43.6711659465 
        AND p.longitude > -79.4677941889 
        AND p.longitude < -79.4477941889) 
   AND p.resource_type = 'Contact' 
   AND c.user_id = 6

For better indexes on coordinates use Quadtree or R-Tree index implementation.

If you intentionally did not miss the joins, try a subquery like this.

select DISTINCT thePoints.*
(   
    SELECT DISTINCT p.* 
    FROM points p
    WHERE (     p.latitude > 43.6511659465 
            AND p.latitude < 43.6711659465 
            AND p.longitude > -79.4677941889 
            AND p.longitude < -79.4477941889) 
    AND p.resource_type = 'Contact' 
) as thePoints
, areas, contacts
WHERE  c.user_id = 6
A: 

You need a rtree index and use the @ operator, normal index won't work.

R-Tree http://www.postgresql.org/docs/8.1/static/indexes-types.html

@ operator http://www.postgresql.org/docs/8.1/static/functions-geometry.html

J-16 SDiZ
R-tree indices don't exist 8.3+.
rfusca
Well, GiST indices (which implement R-trees for the geom types, I think)
araqnid