views:

272

answers:

8

I have few questions on SQL..

  1. How to analyze the performance of a query? Any software, inbuilt features of MSSQL server 2005/2008?

  2. What should be used in place of inin queries so that the performance is better? Eg:

    SELECT * FROM enquiry_courses WHERE enquiry_id IN ( SELECT enquiry_id FROM enquiries WHERE session_id = '4cd3420a16dbd61c6af58f6199ac00f1' )

  3. Which is better: JOINS, EXISTS or IN in terms of performance?

Comments/Help appreciated...

+1  A: 
  1. Use the SQL Server Management Studio, and include Actual Execution Plan and SET STATISTICS TIME and SET STATISTICS IO.

  2. This in corresponds to a JOIN, but rewriting probably won't matter. A guess could be that you need indexes on enquiry_courses.enquiry_id and on enquiries.session_id to improve query performance.

Jonas Lincoln
Thanks..could you plz explain a bit about 1st point? :)
Manish
You can review the Execution Plan to see how the query is executed. Table scans, index lookups, etc. It takes a while to learn what to look for.The useful statistics are "logical reads" and "cpu (ms)", if you want to compare between different queries.
Jonas Lincoln
A: 

MSSQL generally comes with a built in gui tool called Query Analyser which describes how the query will be executed.

For 2) you could rewrite as:

SELECT * 
FROM enquiry_courses ec 
WHERE EXISTS (select 1 FROM enquiries e 
              WHERE e.enquiry_id = ec.enquiry_id 
              and e.session_id ='4cd3420a16dbd61c6af58f6199ac00f1' )

but I can't believe it would make any performance difference in a modern RDBMS.

Adrian Pronk
A: 
  1. check the Excution Plan
  2. You can optimise your query by:
    • Make a "arguments search" rather than IN
    • Put Index on session_id
    SELECT * FROM enquiry_courses as Courses, enquiries as Enquiries
    WHERE Enquiries.session_id = '4cd3420a16dbd61c6af58f6199ac00f1'
AND Courses.enquiry_id = Enquiries.enquiry_id

3.Exists is better for performance.

EDIT: Exists & IN are better than JOIN for performance issues.

EDIT: I re-wrote the query so that it's faster (I put the most restrictive condition first in the WHERE close)

iChaib
-1 for your claim that exists is faster, which is simply not true.
erikkallen
According to Joe Celko's SQL for Smarties, we should use "Exists" whenever it's possible.
iChaib
@iChaib - did Joe happen to mention why? One reason would be if your sql statement inside the IN clause returns an NULLS, nothing will match.
Jeff O
@erikkallen - Request you to plz give an answer... :-)
Manish
@GuinessFan @erikkallen I'll try to find the reference.
iChaib
iChaib
I'm not going to have an argument over a phrase in a book by an author I've never heard of, with a publishing date unknown to me. "Some SQL products" is quite vague, and actually for old versions of Oracle (10 years or so ago), my personal experience was that you should change it the other way around.
erikkallen
@erikkallen the book was published in 2003, ISBN: 097443552, Celko is a guru of SQL (member of the ANSI Database Standards Committee). Anyways, I've updated my post.I hope you're satisfied
iChaib
I think we need some sort of benchmarking to tell us which is faster : Exists or IN
iChaib
A: 

3: I would expect an IN or EXIST clause to be flattened to a JOIN by the database engine, so there shouldn't be a difference in performance. I don't know about SQL Server, but in Oracle you can verify this by checking the execution plan.

Nils Weinander
A: 

This question suggests that EXISTS is quicker which is what I had been taught http://stackoverflow.com/questions/1071828/in-vs-exists-in-sqlserver-2005-or-generally-in-any-rdbms

One thing to note is that EXISTS and IN should be used in preference to NOT EXISTS and NOT IN

Bit of a tangent from performance but this is a good article on the subtle differences between IN and EXISTS http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

AJM
Please see my comment to the accepted answer.
erikkallen
But the blog post is good.
erikkallen
A: 
  1. As others have said, check the "execution plan". SQL Server Management studio can show you two kinds of execution plans, estimated and actual. Estimated is how SQL Server guesses it would execute the query and is returned without actually executing the query, and the actual plan is returned together with a result set and shows what was actually done.

  2. That query looks good, but you have to make sure that you have an index on enquiry_courses.enquiry_id, and it's probably best that enquiries.enquiry_id is not nullable.

  3. The semantics of IN and EXISTS are slightly different (IN will return no rows if there is one or more NULLs in the subquery). If the subquery is guaranteed to be not null, it doesn't matter. There is some kind of "internet truth" that you should use EXISTS on SQL Server and IN on Oracle, but this might have been true when dinosaurs ruled the planet but it doesn't apply anymore. IN and EXISTS both perform a semi-join, and the optimizer is more than capable of deciding how to execute this join.

erikkallen
IN does work with NULL. NOT IN fails
gbn
A: 

I guess the join gives more free to the engine for choice the best query plan. In your exactly case, probably have all solutions similar performances.

SELECT enquiry_courses.* 
FROM enquiry_courses 
INNER JOIN enquiries ON enquiries.enquiry_id=enquiry_courses 
                        AND session_id = '4cd3420a16dbd61c6af58f6199ac00f1' 
guille
A: 

They each behave differently: it is not a performance choice

The only correct and reliable choice is EXISTS or NOT EXISTS that works all the time.

  • JOIN may needs DISTINCT
  • WHERE/LEFT JOIN would needs correct placement of the filter
  • NOT IN fails on NULL

Example:

DECLARE @Parent TABLE (foo int NULL)
INSERT @Parent (foo) VALUES (1)
INSERT @Parent (foo) VALUES (2)
INSERT @Parent (foo) VALUES (3)
INSERT @Parent (foo) VALUES (4)

DECLARE @Child TABLE (bar int NULL, foo int NULL)
INSERT @Child (bar, foo) VALUES (100, 1)
INSERT @Child (bar, foo) VALUES (200, 2)
INSERT @Child (bar, foo) VALUES (201, 2)
INSERT @Child (bar, foo) VALUES (300, NULL)
INSERT @Child (bar, foo) VALUES (301, NULL)
INSERT @Child (bar, foo) VALUES (400, 4)
INSERT @Child (bar, foo) VALUES (500, NULL)

--"positive" checks
SELECT -- multiple "2" = FAIL without DISTINCT
    P.*
FROM
    @Parent P JOIN @Child C ON P.foo = C.foo

SELECT -- correct
    P.*
FROM
    @Parent P
WHERE
    P.foo IN (SELECT c.foo FROM @Child C)

SELECT -- correct
    P.*
FROM
    @Parent P
WHERE
    EXISTS (SELECT * FROM @Child C WHERE P.foo = C.foo)

--"negative" checks
SELECT -- correct
    P.*
FROM
    @Parent P LEFT JOIN @Child C ON P.foo = C.foo
WHERE
    C.foo IS NULL

SELECT -- no rows = FAIL
    P.*
FROM
    @Parent P
WHERE
    P.foo NOT IN (SELECT c.foo FROM @Child C)

SELECT -- correct
    P.*
FROM
    @Parent P
WHERE
    NOT EXISTS (SELECT * FROM @Child C WHERE P.foo = C.foo)

Note: with EXISTS, the SELECT in the subquery is irrelevant as mentioned in ANSI 92 standard...

NOT EXISTS (SELECT * FROM @Child C WHERE P.foo = C.foo)
NOT EXISTS (SELECT NULL FROM @Child C WHERE P.foo = C.foo)
NOT EXISTS (SELECT 1 FROM @Child C WHERE P.foo = C.foo)
NOT EXISTS (SELECT 1/0 FROM @Child C WHERE P.foo = C.foo)
gbn