views:

113

answers:

7

I have a problem with the SQL statement detailed below. The query returns the results I need, but takes an insane amount of time to execute. I now have so many records in the db that the page generally won't load.

SELECT dscan.guid, dscan.drive, dscan.folder, dscan.filename, source.guid  
FROM source 
RIGHT JOIN dscan ON (
  (source.guid & '_dtr' = dscan.guid OR source.guid & '_dto' = dscan.guid OR source.guid = dscan.guid)  
  AND dscan.guid LIKE '%" & Replace(strSearch_guid, "'", "''") & "%'  
  AND dscan.filename NOT LIKE '.[_]%'  
  AND dscan.drive = 'Z:')  
WHERE source.guid Is Null  
ORDER BY dscan.drive, dscan.guid

From what I've been able to find online, ORs in JOIN statements are a problem, but I can't figure out how to fix this.

I'm comparing database records against filenames to identify errors - but the filenames sometimes have '_dtr' or '_dto' appendages that I have to take into consideration.

A: 

I think you should try dissecting it a bit with some parentheses.

((source.guid & '_dtr' = dscan.guid) OR (source.guid & '_dto' = dscan.guid) OR (source.guid = dscan.guid))

Kenneth Reitz
Out of sheer desperation I played around with parentheses like this and it made no difference. The problem is the ORs which, as Charles Bretana says, force table scans.
A: 

ors are notorious performance suckers. Try using an in clause, instead.:

SELECT dscan.guid, dscan.drive, dscan.folder, dscan.filename, source.guid  
FROM source 
RIGHT JOIN dscan ON (
  dscan.guid in (source.guid & '_dtr', source.guid & '_dto', source.guid)
  AND dscan.guid LIKE '%" & Replace(strSearch_guid, "'", "''") & "%'  
  AND dscan.filename NOT LIKE '.[_]%'  
  AND dscan.drive = 'Z:')  
WHERE source.guid Is Null  
ORDER BY dscan.drive, dscan.guid

But, best yet, use your query execution plan to truly get a sense as to what the database engine is doing. You can then see where the real bottlenecks are, and possibly what indices you can add to speed up your query.

Eric
I had tried an IN but had the same result - the query takes so long to execute that the page won't load.
A: 

Perhaps you could re-order things. You could try:

  • selecting from dscan first (I'm assuming the right join on dscan means you want all the rows from it). You may not even need a right join after that.
  • re-ordering your ON clause, eg. put the comparisons most likely to fail first - to take advantage of short-circuiting. Put all the AND's first and the OR's last
  • move some comparisons from the ON clause to the WHERE clause
dave
A: 

In fact I don't think that your performance issue comes from the "ORs" but mainly because you use to concatenate string with column values to make the join.

Also joining on string data does not give the best performance. On the other hand, If you columns were indexed (add an index on them) it would help (if you were not concatening string again)

As a solution, I don't know if it is possible or wanted to add columns to that table and include versions of the strings with the "extensions" already added, so the query would not need to concatenate them ?

Those are some work arounds, not the real solution

MaxiWheat
Adding a column to the dscan table is my backup option. A SQL solution will save me some hassles though.
Thanks to all who tried to help me with this one. I was hoping for a SQL magic bullet, but I don't think there is one. I've opted for adding a column to the dscan table which holds the '_dtr' and '_dto' data, which allows me to run the query with a straight source.guid = dscan.guid.
+1  A: 

Your use of constructed predicate comparsion values, and 'Like' with wildcrads at the beginning will require complete table scans. This will be a major performance hit for large tables until you redesign your schema to eliminate this. However, You can eliminate the performance hit from ORs by unioning three separate sql statements instead. try this:

    SELECT D.guid, D.drive, D.folder, D.filename, S.guid  
    FROM dscan D Left Join source S
        ON S.guid & '_dtr' = D.guid 
          AND D.guid LIKE '%" & Replace(strSearch_guid, "'", "''") & "%'   
          AND D.filename NOT LIKE '.[_]%'    
          AND D.drive = 'Z:')  
    WHERE S.guid Is Null  
  Union
    SELECT D.guid, D.drive, D.folder, D.filename, S.guid  
    FROM dscan D Left Join source S
        ON S.guid & '_dto' = D.guid  
          AND D.guid LIKE '%" & Replace(strSearch_guid, "'", "''") & "%'   
          AND D.filename NOT LIKE '.[_]%'    
          AND D.drive = 'Z:')  
    WHERE S.guid Is Null  
  Union
    SELECT D.guid, D.drive, D.folder, D.filename, S.guid  
    FROM dscan D Left Join source S
        ON S.guid = D.guid    
          AND D.guid LIKE '%" & Replace(strSearch_guid, "'", "''") & "%'   
          AND D.filename NOT LIKE '.[_]%'    
          AND D.drive = 'Z:')  
    WHERE S.guid Is Null  
    ORDER BY D.drive, D.guid
Charles Bretana
`union` kicks off a distinct. If these tables really are big, these will return a ton of results, and have to sort through and eliminate all of the repeated `null`s. I can't imagine that this would be faster on any level than doing an `in`, nor can SQL Server, which I've tested this before on.
Eric
@Eric, As Nick menmtions, Union kicks off a Distinct, so these duplicates should be excluded. If you see different behavior, then there are other columns in the output that are different.@Nick, in theory I agree with you, and I do not completely understand what the query processor does with a Union, but in practice (as I said) I Have seen this to produce better results than using ORs in a Where clause predicate.
Charles Bretana
A: 

I rewrote your query:

 SELECT d.guid, 
        d.drive, 
        d.folder, 
        d.filename,
        src.guid
   FROM DSCAN d
   JOIN (SELECT s.quid,
                s.quid & '_dtr' AS DTR,
                s.quid & '_dto' AS DTO,
           FROM SOURCE s
          WHERE s.guid IS NOT NULL) src ON d.guid IN (s.quid, s.dtr, s.dto)
   WHERE d.guid LIKE '%' & REPLACE(strSearch_guid, "'", "''") & '%'
     AND d.filename NOT LIKE '.[_]%'  
     AND d.drive = 'Z:'
ORDER BY d.drive, d.guid

I assumed you had a type regarding the source.guid IS NULL in the OP - didn't make sense that you'd want only NULL source.guid records and then concatenate onto them.

This:

RIGHT JOIN dscan ON (source.guid & '_dtr' = dscan.guid OR 
                     source.guid & '_dto' = dscan.guid OR 
                     source.guid = dscan.guid)

...will only use the index, assuming one exists, on the guid column to create the values, not for comparison. If you need to do this, it's best to construct them in an inline view or CTE/Subquery Factoring.

OMG Ponies
The source.guid IS NULL is in the right place in my query, since I only want records from the dscan table that don't match the source table. I had tried using IN in place of the ORs already, but with the same poor result.
A: 

Now lets see your start off your select statement with the following:

SELECT dscan.guid, dscan.drive, dscan.folder, dscan.filename, source.guid
FROM source RIGHT JOIN dscan ON

and end it with this:

WHERE source.guid Is Null
ORDER BY dscan.drive, dscan.guid

So if I read this correctly your trying to read everything from the dscan table and anthing that matches for the source table.

??What exactly links the two table if your looking for source.guid is null? Because if source.guid is null why would you be doing a right like:

source RIGHT JOIN dscan ON ( (source.guid & '_dtr' = dscan.guid OR source.guid & '_dto' = dscan.guid OR source.guid = dscan.guid)
AND dscan.guid LIKE '%" & Replace(strSearch_guid, "'", "''") & "%'
AND dscan.filename NOT LIKE '.[_]%'
AND dscan.drive = 'Z:')

Your query is taking a long time because it gets lost in the join. Your trying to make it filter by too many things and filter by like '% blah blah %' does not help with speed. Check to see if you have the proper indexs on the source table and dscan table.

Why is WHERE source.guid is NUll You need source.guid to be something to match it with dscan.guide in the dscan table.

The point of the source.guid is NULL is to identify records in the dscan table that don't have matches in the source table.The other filters aren't the cause of the problem - it's definitely those ORs - and they are required to get the right results.