views:

95

answers:

5

I have a Notes table with a uniqueidentifier column that I use as a FK for a variety of other tables in the database (don't worry, the uniqueidentifier columns on the other tables aren't clustered PKs). These other tables represent something of a hierarchy of business objects. As a simple representation, let's say I have two other tables:

  • Leads (PK LeadID)
  • Quotes (PK QuoteID, FK LeadID)

In the display of a Lead in the application, I need to show all notes related to the lead, including those tagged to any Quote that belongs to that lead. I have two options as far as I can see — either a UNION ALL or several LEFT JOIN statements. Here's how they'd look:

SELECT N.*  
FROM Notes N  
JOIN Leads L ON N.TargetUniqueID = L.UniqueID  
WHERE L.LeadID = @LeadID

UNION ALL

SELECT N.*  
FROM Notes N  
JOIN Quotes Q ON N.TargetUniqueID = Q.UniqueID  
WHERE Q.LeadID = @LeadID 

Or...

SELECT N.*  
FROM Notes N  
LEFT JOIN Leads L ON N.TargetUniqueID = L.UniqueID  
LEFT JOIN Quotes Q ON N.TargetUniqueID = Q.UniqueID  
WHERE L.LeadID = @LeadID OR Q.LeadID = @LeadID

In real life I have a total of five tables that the notes could be attached to, and that number could grow as the application grows. I already have non-clustered indexes set up on the uniqueidentifier columns I'm using, and SQL Profiler says I can't make any more improvements, but when I do a performance test on a realistically-sized test data set, I get the following numbers:

  • UNION ALL — 0.010 sec
  • LEFT JOIN — 0.744 sec

I had always heard that using UNION was bad, and that UNION ALL was only marginally better, but the performance numbers don't seem to bear that out. Granted, the UNION ALL SQL code might be more of a pain to maintain, but at that kind of performance difference it's probably worth it.

So is UNION ALL really better here or am I missing something on the LEFT JOIN code that is slowing things down?

+4  A: 

The UNION ALL version would probably be satisfied quite easily by 2 index seeks. OR can lead to scans. What do the execution plans look like?

Also have you tried this to avoid accessing Notes twice?

;WITH J AS
(
SELECT UniqueID FROM Leads WHERE LeadID = @LeadID
UNION ALL
SELECT UniqueID FROM Quotes WHERE LeadID = @LeadID
)

SELECT N.*  /*Don't use * though!*/
FROM Notes N  
JOIN J ON N.TargetUniqueID = J.UniqueID  
Martin Smith
+1 - UNION ALL is nice and fast if you know your data integrity is good (i.e. no dupes in original datasets).
JNK
Checking the execution plans showed that the UNION ALL could benefit from one more index...and it's fast as it is.The JOINS with the OR statements do indeed result in one expensive index scan, though I'm not sure which parts of that result would be relevant to provide here...any tips?
Josh
@Josh - Just out of interest what happens in the JOIN version if you replace `@LeadID` with a constant (e.g. `120` or whatever you are currently passing in) do you get the same plan with the scan?
Martin Smith
Right...forgot about a CTE! That pushed the query down to 0.007 sec, and makes it a lot less code that would be much easier to maintain.And yes, same plan whether constant or `int` variable.As for the *, I'm only using that in development -- all production queries have explicit column selects.
Josh
+1  A: 

In my experience, SQL Server is really bad with join conditions containing OR. I also use UNIONs in that case, and I got similar results like you (maybe half a second instead of 20).

Who said UNIONS are bad? Especially if you use UNION ALL, there should not be a performance hit, as UNION would have to go through the result to only keep unique records (actually doing something like distinct or group by).

Frank
A: 

UNION is slower, but UNION ALL should be pretty quick, right?

Dean J
+1  A: 

I may be wrong, but I think that you will get a better performance if you rewrite you JOIN version to

SELECT N.*  
FROM Notes N  
LEFT JOIN Leads L ON N.TargetUniqueID = L.UniqueID AND L.LeadID = @LeadID  
LEFT JOIN Quotes Q ON N.TargetUniqueID = Q.UniqueID  AND Q.LeadID = @LeadID
WHERE Q.LeadID IS NOT NULL OR L.LeadID IS NOT NULL
a1ex07
Wow, you're right — it went down to 0.166 sec. Still slower than the UNION ALL but definitely a massive improvement over the original JOINs. I would not have expected that...why would this be better?
Josh
From what I remember about SQL server query processing, `JOIN` processed before `WHERE`, so the more rows we eliminate in `JOIN` phase, the faster query would be. However, I'm a bit confused because here we have OUTER, not INNER join . Maybe `IS [NOT] NULL` works much faster than `=`, I cannot tell for sure... I tried similar queries in Mysql and it also showed the same results (this query is about 5-7 times faster than your original).
a1ex07
+1  A: 

You second query wouldn' even give correct results as it would covert the left joins to inner joins, see here for explantion as to why your syntax is bad:

http://wiki.lessthandot.com/index.php/WHERE_conditions_on_a_LEFT_JOIN

HLGEM
It does produce the correct results, though. I suspect the difference is that I'm using ORs instead of ANDs in the WHERE clause. In your link, the AND-ed conditions result in a limitation of the result set, hence the "conversion" to an inner join. I don't think that's the same thing as this scenario, is it?
Josh