views:

476

answers:

6

Hi,

I'm using Firebird and created a table, called EVENTS. The columns are:

id (INT) | name (VARCHAR) | category (INT) | website (VARCHAR) | lat (DOUBLE) | lon (DOUBLE)

Now a user wants to search for events in a certain radius around him, but entered only two or three letters of his home city. So we've got - lets say - 200 possible cities with their latitudes and longitudes. So my SQL query looks like:

SELECT id FROM events WHERE ((lat BETWEEN 30.09 AND 30.12) AND (lon BETWEEN 40.78 AND 40.81)) OR ((lat BETWEEN 30.09 AND 30.12) AND (lon BETWEEN 40.78 AND 40.81)) OR ...

So we get 200 constraints in the WHERE clause and it takes seconds to actually get the result.

I know the query might look horrible. But are the many constraints really the bottleneck? Can this query be optimized?

Thanks, Stefan

A: 

You could redesign database (if it is possible), to contain not only latitude and longitude, but also name of place of event. Your query would contain like statement, or similar (begins with?). I know, that this might be unusable solution, but constraining yourself to square (in spherical sense) cities or regions seems a bit odd to me ;)

samuil
This is not possible, as many cities have the same name and a lot of events may be not in the city center but only in the administration area of the city (we cannot assume that the user knows to which city an outlying spot belongs).
+2  A: 

My guess would be that the database engine decides that the criterion will likely return a lot of rows, so it wrongly full scans the table. Hint it to do the right thing, or perform some kind of rewrite of the query e.g. (which might or might not help)

SELECT id
  FROM cities c
  JOIN events e ON (e.lat BETWEEN c.lat - .01 AND c.lat + .01) AND (e.lon BETWEEN c.lon - .01 AND c.lon + .01)
 WHERE c.name LIKE 'x%'

In SQL server you could write

SELECT id
  FROM cities c
  INNER LOOP JOIN events e ON (e.lat BETWEEN c.lat - .01 AND c.lat + .01) AND (e.lon BETWEEN c.lon - .01 AND c.lon + .01)
 WHERE c.name LIKE 'x%'

to ensure the correct plan (you do have an index on the lat and lon columns together?)

erikkallen
But we would have 200 constraints too, right? We have 200 cities and so there are 200 c.lat and c.lon.
Stefan - I see what you are worried about, that "c.name like" will have to be replaced with an explicit list of cities which has been determined by the application. It's possible that your performance bottleneck is linking events to 200 constraints and that linking them to the cities would actually make the query better as it can resolve the event constraint as a "single" one possibly involving an index..
Joel Goodwin
@Stefan: If my query works as intended, you'll have one index seek on the cities table and 200 index seeks on the events table. This may (or may not) be better than the full scan on the events table that I guess your original code does. It will certainly be faster when you have many cities with just a few events per city. It will also make life easier for the parser/optimizer, but parsing/optimizing doesn't usually take a measurable time.
erikkallen
You could also try c.lower_lat and c.upper_lat columns that are automatically equal to lat-0.1/lat+0.1 via a trigger, then create an index on those columns to improve the query (and do the same for lon). It's possible that you'd only need the index on one side (say lower_lat and lower_lon), probably having both is overkill. Alternatively, I think Firebird offers an "expression index" which can index computed columns.
Joel Goodwin
@Joel: then you'll need to index 4 columns in the same index and not 2, which makes the index larger for no gain.
erikkallen
@erikkallen - The problem with using +/- on the city lat/lon is that I don't think a city lat/lon index would be used (correct me if you think I'm wrong) -- but if you indexed the operated form (lower_lat/lower_lon) you'd get it. I was just seeing this as additional level of performance improvement if there are too many city hits.
Joel Goodwin
@Joel: Sorry, I misread your comment as suggesting doing lower/upper on the events and not on the city, so my previous comment was wrong. The query should not use any lat/lon index on the cities table but on the events one. However, unless there's something really special with the OP's database engine, there would be no need for the lower/upper columns.
erikkallen
A: 

Create a range search friendly index (a B-tree index) on events.lat and/or events.long (but not a single index on both!) That will at least get you in the ballpark.

What you really want is an R-Tree or similar, which allows indexing multi-dimensional data and gives you good range search performance. PostgreSQL has GiST for that; I don't know what kind of support Firebird has for this sort of problem.

Wiki links for more info: http://en.wikipedia.org/wiki/R-tree http://en.wikipedia.org/wiki/GiST

SquareCog
A: 

You should first use IBExpert over your query to check it's plan to see why is it so slow.

FractalizeR
+1  A: 

Tradeoff space for speed:

Cities don't move. Whenever you add an event, you can pre-calculate the distance between each event and each city, and store the distance to all nearby cities. You can index this by city, so you can directly find events somewhat near a given city (or near 200 cities with the same prefix). Actual longitude/latitude filtering can then be restricted to a much smaller set of events.

MSalters
This is exactly what we did in one of our applications.Populating the intial table took some time, but once we did it, the searching for thinhgs within a 20 mile radious or wahtever the mile conditon was) became very fast.
HLGEM
A: 

Try with a correlated subquery :

select *
from events e
where exists
( select *
  from cities c
  where c.name like 'X%' and
        e.lat BETWEEN c.lat - .01 AND c.lat + .01 and
        e.lon BETWEEN c.lon - .01 AND c.lon + .01
)

Im some scenarios it works faster than joins.

Lluis Martinez