views:

313

answers:

5

I'm designing a query in SSMS 2005 which looks something like this:

SELECT COUNT(DISTINCT ColumnName) FROM Table WHERE ColumnName IS NOT NULL

When I run the query with COUNT() it returns the value 1. When I run it without COUNT(), SSMS reports the correct value eg 212 records.

The column in question is of datatype numeric(16, 0).

For those who might ask, the query in full is:

SELECT COUNT(DISTINCT O_ID) FROM vEmployers
INNER JOIN vEnrolment ON O_ID = E_EnrolmentEmployer
WHERE E_START >= '01-AUG-2008' AND E_START < '01-AUG-2009'
AND O_ID IS NOT NULL AND O_ID IN (
    SELECT O_ID FROM vEmployers
    INNER JOIN vEnrolment ON O_ID = E_EnrolmentEmployer
    WHERE E_Start < '01-AUG-2008' and E_Start >= '01-AUG-2007'
)

This query basically gives a repeat business figure between two 12month periods.

So I'm wondering why "COUNT(DISTINCT ColumnName)" is returning 1 when "ColumnName IS NOT NULL" has been specified?

Here is a sample of the data when SELECT TOP 10 DISTINCT ColumnName FROM... etc is run:

1346116
1346131
1346425
1346923
1349935
1350115
1350153
2594787
2821944
2879631
A: 

I'm guessing it's because all rows returned share the same value for O_ID. You can do a COUNT(*) or COUNT() on a key that is unique to each row to get the row count.

Håvard S
All returned rows are unique because of the DISTINCT clause, proven by doing the query with COUNT().
GateKiller
Can you provide a short excerpt of the results without the count, just showing a few O_ID?
Turnkey
Well, COUNT(DISTINCT ...) will of course count unique non-NULL values, and that's just it. Don't do distinct, count(*) or count something that is unique across all your rows, WITHOUT a DISTINCT clause.
Håvard S
Your right about how COUNT(DISTINCT ...) works and the result should be the same as the row count without the COUNT() function...
GateKiller
Thanks for the confirm, just added as an answer.
Turnkey
A: 

Remove the DISTINCT and you'll get a count on all rows.

Turnkey
True. But as you can see from the full query, there is a join involved so this would return duplicate ID's. And it doesn't answer the question of why COUNT() is returning 1 when it shouldn't.
GateKiller
Yes, that is puzzling, thanks for posting the additional info. Did you run the exact same query to get the excerpt, just removing the count?
Turnkey
Yeah, the excerpt is the exact same query without using the COUNT() function. Very puzzling indeed!
GateKiller
I wonder if this could be data type related. I wonder if you could add a CAST in the COUNT clause to cast it to an INT type to see if that changes anything?
Turnkey
@Turnkey: That totally worked "Count(Distinct Cast(O_ID as Int))" :D. Please can you submit that as an answer, I have some rep points for you.
GateKiller
A: 

Could you please run these queries:

SELECT  COUNT(DISTINCT O_ID)
FROM    vEmployers
INNER JOIN
        vEnrolment
ON      O_ID = E_EnrolmentEmployer
WHERE   E_START >= '01-AUG-2008' AND
        E_START < '01-AUG-2009'
        AND O_ID IN
        (
        SELECT  O_ID
        FROM    vEmployers
        INNER JOIN
                vEnrolment
        ON      O_ID = E_EnrolmentEmployer
        WHERE   E_Start < '01-AUG-2008'
                AND E_Start >= '01-AUG-2007'
        )

and

SELECT  DISTINCT TOP 5 O_ID
FROM    vEmployers
INNER JOIN
        vEnrolment
ON      O_ID = E_EnrolmentEmployer
WHERE   E_START >= '01-AUG-2008' AND
        E_START < '01-AUG-2009'
        AND O_ID IN
        (
        SELECT  O_ID
        FROM    vEmployers
        INNER JOIN
                vEnrolment
        ON      O_ID = E_EnrolmentEmployer
        WHERE   E_Start < '01-AUG-2008'
                AND E_Start >= '01-AUG-2007'
        )
ORDER BY
        O_ID

verbatim, without changing anything?

Quassnoi
The first query returns one row with the value "1".The second query returns five rows of unique values.
GateKiller
@GateKiller: could you please post the structure of the tables?
Quassnoi
What information are you interested in? I'm not sure I would be allowed to post the full table schema + each table has ALOT of columns.
GateKiller
@GateKiller: Just post the relevant columns: `O_ID`, `E_START`, `E_EnrolmentEmployer`: their datatypes, indexes if any and which tables they belong to. Also, it would be nice to see the execution plans for each query. Just run `SET SHOWPLAN_TEXT ON \n GO \n SELECT …` (`\n` is a newline)
Quassnoi
A: 
SELECT   
  COUNT(*)  
FROM    vEmployers  
INNER JOIN vEnrolment ON O_ID = E_EnrolmentEmployer
WHERE 
        E_START >= '01-AUG-2008' 
        AND E_START < '01-AUG-2009'
        AND O_ID IS NOT NULL AND O_ID IN (
          SELECT O_ID FROM vEmployers
          INNER JOIN vEnrolment ON O_ID = E_EnrolmentEmployer
          WHERE E_Start < '01-AUG-2008' and E_Start >= '01-AUG-2007'
        )
GROUP BY
  O_Id
Benoit Vidis
+2  A: 

The use of the numeric(16, 0) made me suspect that it was data type related. Add a CAST in the COUNT clause to cast it to an INT type:

Count(Distinct Cast(O_ID as Int))
Turnkey
Thankyou Turnkey :)
GateKiller