views:

108

answers:

4

I was inspired by the good answers from my previous question about SQL. Now this SQL is run on a DB with Interbase 2009. It is about 21 GB in size.

SELECT DistanceAsMeters, AddrDistance.Bold_Id, AddrDistance.Created, AddressFrom.CityName_CO as FromCity, AddressTo.CityName_CO as ToCity
FROM AddrDistance
LEFT JOIN Address AddressFrom ON AddrDistance.FromAddress = AddressFrom.Bold_Id
LEFT JOIN Address AddressTo ON AddrDistance.ToAddress = AddressTo.Bold_Id
Where  DistanceAsMeters = 0 and PseudoDistanceAsCostKm = 0
       and not AddrDistance.bold_id in (select bold_id from DistanceQueryTask)
Order By Created Desc

There are 840000 rows with AddrDistance 190000 rows with Address and 4 with DistanceQueryTask.

The question is, can this be done faster? I guess, the same query is run many times select bold_id from DistanceQueryTask. Note that I'm not interested in stored procedures, just plain SQL :)

EDIT1 Here is the current execution plan:

Statement: SELECT DistanceAsMeters, AddrDistance.Bold_Id, AddrDistance.Created, AddressFrom.CityName_CO as FromCity, AddressTo.CityName_CO as ToCity
FROM AddrDistance
LEFT JOIN Address AddressFrom ON AddrDistance.FromAddress = AddressFrom.Bold_Id
LEFT JOIN Address AddressTo ON AddrDistance.ToAddress = AddressTo.Bold_Id
Where  DistanceAsMeters = 0 and PseudoDistanceAsCostKm = 0
       and not AddrDistance.bold_id in (select bold_id from DistanceQueryTask)
Order By Created Desc

PLAN (DISTANCEQUERYTASK INDEX (RDB$PRIMARY218))
PLAN SORT (JOIN (JOIN (ADDRDISTANCE NATURAL,ADDRESSFROM INDEX (RDB$PRIMARY234)),ADDRESSTO INDEX (RDB$PRIMARY234)))

And yes, DistanceQueryTask is meant to have a low number if rows in the database.

+2  A: 

Using Left Join and subqueries will slow down any query.

You can get some improvements with the correct indexes (on Bold_id, DistanceMeters, PseudoDistanceAsCostKm ) remember that more indexes increase the size of the database

Daniel Luyo
The biggest improvement come when indexing those fields, thanks.
Roland Bengtsson
+2  A: 

I suppose bold_id is your key, and thus properly indexed.
Then replacing the subselect and the not...in by a join might help the optimizer.

SELECT DistanceAsMeters, Bold_Id, Created, AddressFrom.CityName_CO as FromCity, AddressTo.CityName_CO as ToCity
FROM AddrDistance
LEFT JOIN Address AddressFrom ON AddrDistance.FromAddress = AddressFrom.Bold_Id
LEFT JOIN Address AddressTo ON AddrDistance.ToAddress = AddressTo.Bold_Id
LEFT JOIN DistanceQueryTask ON AddrDistance.bold_id = DistanceQueryTask.bold_id
Where  DistanceAsMeters = 0 and PseudoDistanceAsCostKm = 0
  and DistanceQueryTask.bold_id is null
Order By Created Desc
François
This query give 0 rows, probably because DistanceQueryTask.bold_id is never null as it is the key to identify rows. Apart from that the bold_id of AddrDistance and DistanceQueryTask can never be the same.
Roland Bengtsson
@Roland: A left join combined with a `where right_table.Key is null` gives only the records of the left table that have no match in the right table (those where you would have the right_table columns null without the where clause)
François
+2  A: 

Create an index for this part: (DistanceAsMeters = 0 and PseudoDistanceAsCostKm = 0) because it does a (bad) table scan for it: ADDRDISTANCE NATURAL

And try to use the join instead of subselect as stated by Francois.

André
+2  A: 

As Daniel and Andre sugges an index helps a lot.
I would suggest this index (DistanceMeters, PseudoDistanceAsCostKm, Bold_id), because the first 2 parts of the index is constant, then its a smal portion of the index that is needed to read.

If it is a fact that FromAddress and/or ToAddress exist you can change the LEFT JOIN to INNER JOIN, because it is often faster (the query optimizer can make some assumptions).

BennyBechDk