views:

48

answers:

3

I have 3 sql tables:

Data36 (Data_ID:int <PK>, type:int), 
Data38(Data_ID:int <PK>, clientId:int), 
Data47(Data_ID:int <PK>, payerID:int).

I thought the following queries are identical, because I don't use aggregate functions here and GROUP BY should behave the same way as DISTINCT. But they return very different result sets and I don't understand why. Please help me to understand defference between these queries.

Query 1 (returns 153 rows):

SELECT payer.Data_ID, payer.type
 FROM Data36 AS payer
 JOIN Data38 AS serv ON payer.Data_ID = serv.clientId
 WHERE ((SELECT count(*) FROM Data47 AS regsites WHERE regsites.payerID = payer.Data_ID) = 0)
 GROUP BY payer.Data_ID, payer.type

Query 2 (returns 4744 rows):

SELECT DISTINCT payer.Data_ID, payer.type
 FROM Data36 AS payer
 JOIN Data38 AS serv ON payer.Data_ID = serv.clientId
 WHERE ((SELECT count(*) FROM Data47 AS regsites WHERE regsites.payerID = payer.Data_ID) = 0)

SQL Server version is 5.0.40.

Let me know if you need more specific information.

Update: Sorry for not mentioning this: Data_ID is a Primary Key in these tables, so Data_ID is unique for each record in these tables.

SELECT count( *  ) FROM Data36 
--returns 5998
SELECT count(DISTINCT Data_ID) FROM Data36 
--returns 5998

Update 2: In Query 1 I changed 'GROUP BY payer.Data_ID' to 'GROUP BY payer.Data_ID, payer.type'. The result is still the same - 153 rows.

A: 

Whan happens when you try this?

select payer.Data_ID, payer.type from
(
SELECT DISTINCT Data_ID, type 
 FROM Data36
) AS payer 
 JOIN Data38 AS serv ON payer.Data_ID = serv.clientId 
 WHERE ((SELECT count(*) FROM Data47 AS regsites WHERE regsites.payerID = payer.Data_ID) = 0) 
Madhivanan
This query returns 9210 result. But payer table (Data36) has only 5995 rows..
Alex
A: 

Correct and most efficient way to found records that hasn't coresponding records in the third table is:

SELECT payer.Data_ID, payer.type
FROM Data36 AS payer
JOIN Data38 AS serv ON payer.Data_ID = serv.clientId
LEFT JOIN Data47 AS regsites  ON(regsites.payerID = payer.Data_ID)
WHERE regsites.payerID IS NULL
GROUP BY payer.Data_ID
Naktibalda
Thank you. But can you explain why Query 1 and Query 2 return different results?
Alex
Query1 returns 1 row for unique payer.Data_ID value.Query2 returns 1 row for unique combination of payer.Data_ID and payer.type.Analyze result of query 2 visually.
Naktibalda
A: 

Looking at your query there appear to be only one functional difference...

Query1 (153 records):

SELECT payer.Data_ID, payer.type FROM <blah> GROUP BY payer.Data_ID

Query2 (4744 records):

SELECT DISTINCT payer.Data_ID, payer.type FROM <blah>


As Query1 only groups by Data_ID I would say that you have 153 Data_IDs but that each Data_ID can have many different type values.

Query2 is returning every different Data_ID/type combination, but Query1 is only returning 1 record for each Data_ID.


I'm also surprised that Query1 runs at all, as I would expect you would be Required to have an aggregate function such as MIN/MAX around the type field.

Dems
Sorry for not mentioning this: Data_ID is a Primary Key in these tables, so Data_ID is unique for each record in these tables.So Date_ID/type combination is the same for each row because they are from the same table.
Alex
@Dems: Welcome to MySQL, whose "feature" is to allow omitting columns from the GROUP BY by design: http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html
OMG Ponies