views:

474

answers:

8

I much prefer to code in t-sql using what is in effect an in-line join, rather than have a long list of the joins at the end of the stored procedure or view.

For example, I code:

SELECT      PKey  ,    Billable, 
    (SELECT LastName  FROM Contact.dbo.Contacts WHERE (Pkey = Contacts_PKey)),
    (SELECT Description FROM Common.dbo.LMain WHERE (PKey= DType)),  
    (SELECT  TaskName  FROM  Common.dbo.LTask WHERE  (PKey =  TaskType)) ,  
    StartTime,  EndTime,  SavedTime
FROM   dbo.TopicLog   where  StartTime > '7/9/09'  ORDER BY  StartTime

Rather than

SELECT t.PKey, t.Billable, c.LastName, m.Description, lt.TaskName, t.StartTime, t.EndTime, t.SavedTime
FROM dbo.TopicLog AS t     
inner join  Contact.dbo.Contacts as c   on  c.Pkey = t.Contacts_PKey and t.StartTime > '7/9/09'
inner join  Common.dbo.LMain  as m  on  m.PKey = t.DType
inner join  Common.dbo.LTask  as lt on lt.PKey = t.TaskType
ORDER BY t.StartTime

I prefer this type of syntax because it is so much less confusing when writing or debugging, especially when there are many tables being joined or other stuff going on (case statements, t-sql functions, self joins, etc)

But my question is - am taking a performance hit by querying the database this way.

I do not have enough data collected yet to be able to measure a difference, but I will at some point down the road.

I would like to find out before I proceed further. I would not want to have to go back later and recode everything to improve performance.

+19  A: 

The second one (the actual inner join), generally. The first one (subqueries) does 3 queries for every row, but this is generally managed by the compiler so that the differences are mitigated.

Best yet: Check the query execution plans for yourself!

Since you're getting slow performance, my guess is that your tables aren't indexed properly. You should have clustered indexes on all of your primary keys, and non-clustered indexes on the foreign keys (those that you use to make the joins).

I should note that these two queries are equivalent if and only if you have matching values in all of your join conditions (i.e.-always returns all of the rows from the main table). Otherwise, you'll get null from the subquery if there's no match. Inner joins actively filter out any rows that don't match the join conditions. The subquery approach is actually equivalent (in results, not speed or execution) to a left outer join.

Eric
+1. As you point out, the wins to be had from thoughtful indexing are far more likely to produce significant gains. (But checking the execution plan will let them know for sure!)
Beska
+1 "Check the query execution plans for yourself!" That is the only way to be sure. The optimizer *might* turn them into JOINs for you automatically. Although, the two queries are not exactly the same. #1 is a LEFT JOIN, #2 is an INNER JOIN. So they will give you different plans anyway.
beach
This is quite misleading - its a common misconception that the subqueries are slower for the reason you have given, where in fact SQL server rewrites subqueries as joins where possible anyway during recompilation.
Kragen
@kragen: Please read the entirety of my entry. Specifically, "but this is generally managed by the compilier" and "Check the query execution plans." Also see beach's comment. This has been addressed and is not misconstrued in the least.
Eric
A: 

Generally speaking sub-queries (ie first example) are slower, but the easiest way to optimize and analyze your queries is trying them through your specific database, MS SQL server provides excellent analysis and performance tuning tools.

Martin Dale Lyness
Thats simply untrue - often SQL server parses subqueries into an execution tree which is identical to the one produced by a join.
Kragen
+10  A: 

The first method is not an inner join at all, it is a correlated subquery. And they are more like left outer joins than inner joins as they will return NULLs when there is no matching value.

Will Rickards
+3  A: 

The first one looks like a pathological way to do a join to me. I would avoid it, if for no other reason that it's unusual - an experienced SQL DBA looking at it to maintain it will spend a while searching for the reason for why it's coded like that, when there is no real reason as far as what you want the query to do. It behaves more like an outer join if there's missing data.

The second example looks normal.

You should know that the old-school way of doing inner joins is like this:

SELECT t.PKey, t.Billable, 
 c.LastName, m.Description, lt.TaskName, 
 t.StartTime, t.EndTime, t.SavedTime
FROM 
 dbo.TopicLog as t, Contact.dbo.Contacts as c, 
 Common.dbo.LMain as m,  Common.dbo.LTask as lt   
WHERE c.Pkey = t.Contacts_PKey and t.StartTime > '7/9/09'
  AND m.PKey = t.DType
  AND lt.PKey = t.TaskType
ORDER BY  t.StartTime

And at a guess this is equivalent to the modern "inner join table on field " syntax once it has been parsed.

As the other answer says, if you're looking for faster queries, the first thing to do is to check that the tables' indexes are sorted out. Then look at the query execution plan.

Anthony
It seems like this syntax is what he's after. Indexes or no indexes doing subqueries for each row table being selected from is going to be slow for even a small table (like more than maybe 4000 rows).
Jon
A: 

A lot of SQL programmers are completely unaware that the optimizer frequently resolves subqueries into joins. There is likely no cause for performance woes in either query.

View the execution plan!

David B
+1  A: 

The two queries in the OP say very different things and only produce the same results if the correct data model assumptions are in place:

  1. Each of the columns used in the lookup have not null constraints and foreign key constraints.

  2. The primary key or a unique key of the lookup table is used.

It may be in the OP specific case these assumptions are true, but in the general case these are different.

As others have pointed out, the sub query is more like an outer join in that it will give back a null for the columns LastName, Description and Taskname instead of filtering out the row entirely.

In addition, if one of the subqueries returns more than one row, you will get an error.

As far as personal preference, I prefer the second example with the join syntax, but that is subjective.

Shannon Severance
A: 

I think the second one is execute more faster. Reason behind this is by using alias(t,c,m etc in your example) name relational engine can easily find out the pointer to the table location.

I think this is one of the tips in sql tunning.

anishmarokey
A: 

Generally speaking there is no difference in the performance of simple subqueries vs joins - it is a common misconception that subqueries are much slower (because SQL server has to loop through the inner query), however generally speaking this is simply untrue! During the compilation process SQL server produces an execution tree, and often in these trees subqeuries are equivalent to joins.

Its worth noting that your two queries are not logically the same and produced different results for me, the second query should actually read something along the lines of: (this still isnt identical, but its closer)

SELECT t.PKey, t.Billable, c.LastName, m.Description, lt.TaskName, t.StartTime, t.EndTime, t.SavedTime
FROM dbo.TopicLog AS t     
LEFT OUTER JOIN Contact.dbo.Contacts as c   on  c.Pkey = t.Contacts_PKey
LEFT OUTER JOIN Common.dbo.LMain  as m  on  m.PKey = t.DType
LEFT OUTER JOIN Common.dbo.LTask  as lt on lt.PKey = t.TaskType
WHERE t.StartTime > '7/9/09'
ORDER BY t.StartTime

In my testing the subqeury produced an execution plan with a drastically lower number of reads (15 as opposed to 1000), however slightly higher cpu - on average the execution times were roughly equivalent.

Its worth noting however that this wont always be the case (particularly when evaluating functions inside a subquery), and sometimes you may run into problems due to a subquery. Generally speaking however its best to worry about such cases only when you run into performance problems.

Kragen