ansaurus

Question

SQL: Performance comparison for exclusion (Join vs Not in)

Answer 1

+1 A:

The questions has been asked several times. The often fastest way is to do this:

SELECT * FROM table1 
WHERE id in (SELECT id FROM table1 EXCEPT SELECT id FROM table2)

As the whole joining can be done on indexes, where using NOT IN it generally cannot.

inflagranti 2010-06-21 09:59:56

Thanks for the answer. I couldn't find previously asked questions.

someone 2010-06-21 10:10:22

This for instance I think is similar: http://stackoverflow.com/questions/3074862/deleting-all-records-of-a-table-that-are-not-referenced-from-another-table/3074873#3074873

inflagranti 2010-06-21 10:26:12

Yes, that's similar. But it's difficult to find that question as it lacks detailed description and tags. You know it because you answered that one 2 days ago :)

someone 2010-06-21 11:56:26

Hehe. Yeah, I tried finding it through manual search myself first too and I couldnt. So I may have had an advantage there ;)

inflagranti 2010-06-22 08:19:29

Answer 2

+1 A:

I prefer to use

Select a.Col1
From TableA a
Left Join TableB b on a.Col1 = b.Col1
Where b.Col1 Is Null

I believe this will be quicker as you are utilising the FK constraint (providing you have them of course)

Sample data:

create table #a
(
Col1 int
)
Create table #b
(
col1 int
)

insert into #a
Values (1)
insert into #a
Values (2)
insert into #a
Values (3)
insert into #a
Values (4)

insert into #b
Values (1)
insert into #b
Values (2)


Select a.Col1
From #a a 
Left Join #b b on a.col1 = b.Col1
Where b.Col1 is null

Barry 2010-06-21 10:00:41

Thank you for the answer but this solution does not work for me. One reason is Col1 cannot be null (tableB has only Col1). So your solution gives me no results.

someone 2010-06-21 10:16:14

@Masa44 - Are you sure? It works for me fine. I've added some test data as an example.

Barry 2010-06-21 10:32:48

Left join is often not the fastest way for many databases.

HLGEM 2010-06-21 13:21:00

Barry, thanks for the example. Your example works just fine but I failed to do so in my case, I guess I am making a small error.

someone 2010-06-22 08:57:21

Answer 3

+2 A:

Depends on the RDBMS. For Microsoft SQL Server NOT EXISTS is preferred to the OUTER JOIN as it can use the more efficient Anti-Semi join.

For Oracle Minus is apparently preferred to NOT EXISTS (where suitable)

You would need to look at the execution plans and decide.

Martin Smith 2010-06-21 10:01:12

Thanks for the answer. DBMS that I am interested in is Oracle. Are you sure that Minus is more efficient than a query with join or inflagranti's answer?

someone 2010-06-21 10:22:01

@masa44 Nope not at all. Though inflagranti's answer uses EXCEPT which *is* Minus in Oracle. The recommendation was to look at the execution plans.

Martin Smith 2010-06-21 10:34:05

+1 for recommending to investigate execution plans (on real data with updated statistics). Also the question is which indexes are present.

Unreason 2010-06-21 11:46:08

Thanks Martin Smith, good recommendation. If I were able, I would have also voted this up :)

someone 2010-06-21 11:57:30

Answer 4

A:

There is no correct answer to this question. Every RDBMS has query optimizer that will determine best execution plan based on available indices, table statistics (number of rows, index selectivity), join condition, query condition, ...

When you have relatively simple query like in your question, there is often several ways you can get results in SQL. Every self respecting RDBMS will recognize your intention and will create same execution plan, no matter which syntax you use (subqueries with IN or EXISTS operator, query with JOIN, ...)

So, best solution here is to write simplest query that works and then check execution plan.
If that solution is not acceptable then you should try to find better query.

zendar 2010-06-21 11:16:28

No, databases will not generate the same plan for differnt types of queries that have teh same result set. If you look vendor by vendor you will find out which are the most efficient ways to do specific types of queries.

HLGEM 2010-06-21 13:22:35

@HLGEM You obviously never looked at execution plans in Oracle in situation described in question.

zendar 2010-06-21 13:28:30

NO I l;ook at the execution plans in SQL server where there is huge differnce between such differnt approaches to the same query. ANd knowing that I know which type to try first. Write the simplest query is just bad advice. ANd "Every self respecting RDBMS will recognize your intention and will create same execution plan, no matter which syntax you use" is just plain false.

HLGEM 2010-06-21 14:34:31

ansaurus

tags:

views:

answers:

SQL: Performance comparison for exclusion (Join vs Not in)

related questions