views:

646

answers:

6

I learned a trick a while back from a DBA friend to speed up certain SQL queries. I remember him mentioning that it had something to do with how SQL Server compiles the query, and that the query path is forced to use the indexed value.

Here is my original query (takes 20 seconds):

select Part.Id as PartId, Location.Id as LocationId
 FROM Part, PartEvent PartEventOuter, District, Location 
WHERE 
    PartEventOuter.EventType = '600'   AND PartEventOuter.AddressId = Location.AddressId  
    AND Part.DistrictId = District.Id   AND Part.PartTypeId = 15   
    AND District.SubRegionId = 11   AND PartEventOuter.PartId = Part.Id  
    AND PartEventOuter.EventDateTime <= '4/28/2009 4:30pm'   
    AND NOT EXISTS (
      SELECT PartEventInner.EventDateTime  
      FROM PartEvent PartEventInner
      WHERE PartEventInner.PartId = PartEventOuter.PartId
       AND PartEventInner.EventDateTime > PartEventOuter.EventDateTime 
       AND PartEventInner.EventDateTime  <= '4/30/2009 4:00pm')

Here is the "optimized" query (less than 1 second):

select Part.Id as PartId, Location.Id as LocationId
 FROM Part, PartEvent PartEventOuter, District, Location 
WHERE 
    PartEventOuter.EventType = '600'   AND PartEventOuter.AddressId = Location.AddressId  
    AND Part.DistrictId = District.Id   AND Part.PartTypeId = 15   
    AND District.SubRegionId = 11   AND PartEventOuter.PartId = Part.Id  
    AND PartEventOuter.EventDateTime <= '4/28/2009 4:30pm'   
    AND NOT EXISTS (
      SELECT PartEventInner.EventDateTime  
      FROM PartEvent PartEventInner
      WHERE PartEventInner.PartId = PartEventOuter.PartId
       **AND EventType = EventType**
       AND PartEventInner.EventDateTime > PartEventOuter.EventDateTime 
       AND PartEventInner.EventDateTime  <= '4/30/2009 4:00pm')

Can anyone explain in detail why this runs so much faster? I'm just trying to get a better understanding of this.

+3  A: 

probably because you are getting a Cartesian product without your EventType = EventType

From WikiPedia: http://en.wikipedia.org/wiki/SQL

"[SQL] makes it too easy to do a Cartesian join (joining all possible combinations), which results in "run-away" result sets when WHERE clauses are mistyped. Cartesian joins are so rarely used in practice that requiring an explicit CARTESIAN keyword may be warranted. (SQL 1992 introduced the CROSS JOIN keyword that allows the user to make clear that a Cartesian join is intended, but the shorthand "comma-join" with no predicate is still acceptable syntax, which still invites the same mistake.)"

you are actually going through more rows than necessary with your first query.

http://www.fluffycat.com/SQL/Cartesian-Joins/

Robert Greiner
Can't be...there's no table qualifier on EventType, so it's comparing it to itself. Essentially a no-op (except for NULLS, that is).
Mark Brackett
In my first attempts to optimize this query, I did in fact convert everything to (left) inner joins on the related fields. However, this did not impact the execution time one bit.
Keith
A: 

Odd, do you have an index defined with both EventType and EventDateTime in it?

Edit:
Wait, is EventType a nullable column? Column = Column will evaluate to FALSE* if it's value is NULL. At least using the default SQL Server settings.

The safer equivalent would be EventType IS NOT NULL. See it that gives the same result speed-wise?


*: My T-SQL reference says it should evaluate to TRUE with ANSI_NULLS set to OFF, but my query window says otherwise. *confuzzled now*.
Any ruling? TRUE, FALSE, NULLor UNKNOWN? :) Gotta love 'binary' logic in SQL :(

Thorarin
No indexes on the PartEvent view's columns (on the tables the view is pulling from that is)
Keith
Will have to try "EventType IS NOT NULL" when I return to work Monday
Keith
Checking EventType IS NOT NULL did not speed up the query - it is still 20 seconds. Checking if EventType = EventType still takes < 1 second.
Keith
Have you compared execution plans to shed some light on this? I'd love to know what is going on..
Thorarin
A: 

SQL Server uses an index lookup if and only if all columns of this index are in the query.

Johan Buret
A: 

Every non-indexed column you add performs a table scan. If you narrow your query down earlier on in your WHERE clause, subsequent scans are faster. Thus by adding an Index scan, your table scans run against less data.

tsilb
+1  A: 

Are there a large number of records with EventType = Null?
Before you added the aditional restriction your subquery would have been returning all those Null records, which would then have to be scanned by the Not Exists predicate for every row in the outer query... So the more you restrict what the subquery returns, the fewer the rows that have to be scanned to verify the Not Exists...

If this is the issue, it would probably be even faster if you restricted the records to EventType = '600' in the subquery as well....

Select Part.Id as PartId, Location.Id as LocationId 
FROM Part, PartEvent PartEventOuter, District, Location 
WHERE PartEventOuter.EventType = '600'   
    AND PartEventOuter.AddressId = Location.AddressId      
    AND Part.DistrictId = District.Id   
    AND Part.PartTypeId = 15       
    AND District.SubRegionId = 11   
    AND PartEventOuter.PartId = Part.Id      
    AND PartEventOuter.EventDateTime <= '4/28/2009 4:30pm'       
    AND NOT EXISTS (SELECT PartEventInner.EventDateTime                  
                    FROM PartEvent PartEventInner
                    WHERE PartEventInner.PartId =  PartEventOuter.PartId
                       AND EventType = '600'                        
                       AND PartEventInner.EventDateTime > PartEventOuter.EventDateTime
                       AND PartEventInner.EventDateTime  <= '4/30/2009 4:00pm')
Charles Bretana
That is just as fast, however wrong for our result set... The subquery is correct as it was (any "EventType")
Keith
A: 

This sort of thing used to be a lot more common than it is now. Oracle 6 for instance used to be sensitive to the order in which you placed restrictions in the WHERE clauses. The reason why you're surprised is really because we've become so good at expecting the DB engine to always work out the best access path no matter how you structure your SQL. Oracle 6 & 7 (I switched to MSSQL after that) also had the hint extension which you could use to tell the database how it might like to construct the query plan.

In this specific case it's difficult to give a conclusive answer without seeing the actual query plans but I suspect the difference is you have a compound index which uses EventType which is not being employed for the first query but is for the second. This would be unusual in that I'd expect your first query to have used it anyway, so I suspect that the database statistics may be out of date, so

REGENERATE STATISTICS

then try again and post the results here.

Cruachan