tags:

views:

74

answers:

3

I'm not able to figure out the difference between these queries. I'm pretty sure that the first one is an equi-join. I'm not sure how the second one ISN'T the same as the first. The sub query in #2 selects all Ids from S, and then it returns all R's that also have those ID's, no?

SELECT R.cname FROM R, S, WHERE R.Id = S.Id

SELECT R.cname FROM R WHERE R.Id IN (SELECT S.Id FROM S)

EDIT:
i grabbed this off a past exam....and I quote:
Although the queries seem to produce the same answers they don’t. How do they differ? You can give example instances of R and S that show that these two queries can produce different answers.
R (Id, TourId, cname, caddress, cost)
S (Id, SpecId)

EDIT: EDIT:
I'm guessing it has to do with duplicates.

+1  A: 

From application developer point of view both sub queries and join queries can work for you if you have well defined relation between tables (especially for join queries). But in general an equi join (or cross join which DB handles as equi join), can be faster as DB can optimize the join query. But in case of sub query, DB has to run both queries independently so you may see slightly slow performance. That being said, join can be slow if tales have too much data and sub queries may work in those case. general rule of thumb is to use join as and when possible. If you see some issue try sub query.

Fazal
+1  A: 

Assuming these data:

R

id   cname
--   --
1    RA
1    RB
2    RC

S
id   cname
--   --
1    SA
1    SB
2    SC
2    SD

, this query:

SELECT  R.cname
FROM    R, S
WHERE   R.Id = S.Id

returns

RA
RA
RB
RB
RC
RC

, while this one:

SELECT  R.cname
FROM    R
WHERE   R.Id IN
        (
        SELECT  S.Id
        FROM S
        )

returns

RA
RB
RC

The first query is a JOIN of two tables: for each matching id it returns all possible combinations of the records from both tables

The seconds query is a one table with a predicate. It can return each record from R at most once.

Quassnoi
+1  A: 

IMO, both are poor choices because they are harder to read than:

Select R.Naem 
From R 
    Join S 
        On S.ID = R.ID

Beyond functionality, the paramount goal in writing code should be to make your intent clear for other readers. A developer has to read the comma in the first example and know that this implies a Cross Join. In the second, the reader has to determine why you wrote a subquery instead of simply writing a Join. Was there a performance reason? Was there reason with respect to the result?

Functionally, the two queries can produce different results depending on the data. Suppose we have the following:

Create Table S ( Id int null )
Create Table R ( Id int null )

insert S Values(1)
insert S Values(1)
insert S Values(2)

insert R Values(1)
insert R Values(2)
insert R Values(3)

Select * From @R As R, @S As S Where R.Id = S.Id

This results in 1,1,2

Select * From @R As R Where R.Id In( Select S.Id From @S As S )

This results in 1,2

The second query will produce one row from R that has one or more values in S. The first query is akin to an Inner Join (even though it is using a Cross Join) which will return a row for each row in R and S that match. Since there are two rows in S that match a row in R, we get two rows for the value 1.

Thomas
+1 For pointing out the alternative syntax as well. Far too many people mix their join criteria with their selection criteria (and no, they do not produce identical results all the time either).
Duncan