ansaurus

Question

Null value returned on Count Distinct (pl sql)

Answer 1

A:

COUNT(DISTINCT column) doesn't count NULL values:

SELECT  COUNT(DISTINCT val1)
FROM    (
        SELECT  NULL AS val1
        FROM    dual
        )

---
0

Could it be the case?

Quassnoi 2009-06-18 17:23:47

Answer 2

A:

I would try putting the HAVING clause conditions in the WHERE clause instead. Is there any reason you chose HAVING? Just FYI, HAVING is a filter that is done after the result set is returned which may cause unexpected results. Also it is not used in the optimization of the query. If you don't have to use HAVING I would suggest not using it.

I would suggest adding the counts to the SELECT clause then joining them in the WHERE clause.

northpole 2009-06-18 17:33:46

(A) He is using HAVING so he can do a condition on aggregate functions, i.e. COUNT. (B) I don't know if you're basing your information on some other DBMS, but in Oracle the HAVING clause is most certainly a part of the query that is parsed and optimized along with everything else.

Dave Costa 2009-06-18 17:35:49

In ORACLE, SQL statements can utilize both a WHERE clause and an Oracle HAVING clause. The WHERE clause will filter rows as they are selected from the table, and before grouping, the Oracle HAVING clause will filter rows after the grouping.

northpole 2009-06-18 17:39:32

additionally, HAVING is for aggregation and is not used in the optimization of the query in MySQL or ORACLE.

northpole 2009-06-18 17:44:28

Putting the HAVING clause in the WHERE throws an error. ORA-00934: group function is not allowed here

Dave 2009-06-18 17:48:39

no, I mean removing the HAVING all together and using the SELECT and WHERE clauses to handle it

northpole 2009-06-18 17:50:10

Yes, that's what I did.SELECT COUNT(DISTINCT t442.c1)...AND t649.c536870939 > 1AND count(distinct t631.c536870922)= count (distinct t649.c536870931)

Dave 2009-06-18 17:51:35

ahhh, yes, you are correct. Testing this now in my environment shows the same. I was hopeful that you could alias the count and use that in the where clause....let me continue to think about other solutions.

northpole 2009-06-18 17:57:59

Answer 3

+1 A:

What is the result of:

SELECT COUNT (DISTINCT t631.c536870922),
       COUNT (DISTINCT t649.c536870931)
          FROM t442, t658, t631, t649
         WHERE t442.c1 = t658.c536870930
           AND t442.c200000003 = 'Network'
           AND t442.c536871139 < 2
           AND t631.c536870913 = t442.c1
           AND t658.c536870925 = 1
           AND (t442.c7 = 6 OR t442.c7 = 5)
           AND t442.c536870954 > 1141300800
           AND (t442.c240000010 = 0)
           AND t442.c1 = t649.c536870914
           AND t649.c536870939 > 1

If the two columns there never have equal values, then it makes sense that adding the HAVING clause would eliminate all rows from the result set.

Dave Costa 2009-06-18 17:34:28

4 and 3, respectively. See next comment.

Dave 2009-06-18 17:43:09

Also, even if the HAVING clause did eliminate all rows, shouldn't I get 0 instead of null? One of the other places I use this query does correctly return 0.

Dave 2009-06-18 17:49:54

Your query is functionally equivalent to one like SELECT x FROM (SELECT 3 AS x, 4 as y FROM dual) WHERE x = y. Your result set has a one row (the count(*) results) with two columns, and you're saying show me the rows where they're equal. If they're not equal, no rows returned. If you're getting a 0 in one place, that must be because the count(*) for both those columns is 0 for each of them.

Steve Broberg 2009-06-18 18:07:45

Answer 4

A:

If I do this:

SELECT distinct t442.c1, count(distinct t631.c536870922), 
    count (distinct t649.c536870931)
          FROM t442, t658, t631, t649
         WHERE t442.c1 = t658.c536870930
           AND t442.c200000003 = 'Network'
           AND t442.c536871139 < 2
           AND t631.c536870913 = t442.c1
           AND t658.c536870925 = 1
           AND (t442.c7 = 6 OR t442.c7 = 5)
           AND t442.c536870954 > 1141300800
           AND (t442.c240000010 = 0)
           AND t442.c1 = t649.c536870914
           AND t649.c536870939 > 1
           group by t442.c1
           having count(distinct t631.c536870922)= 
                         count (distinct t649.c536870931)

I see the 23 rows that should be counted. Removing the HAVING statement returns 24 rows, the extra one which does not meet that HAVING criteria.

EDIT: Results of the query, as requested per Steve Broberg:

row | t442.c1         | cnt t631 | cnt 649
-------------------------------------------
1   | CHG000000230378 |    2     |    1
2   | CHG000000230846 |    1     |    1
3   | CHG000000232562 |    1     |    1
4   | CHG000000232955 |    1     |    1
5   | CHG000000232956 |    1     |    1
6   | CHG000000232958 |    1     |    1
7   | CHG000000233027 |    1     |    1
8   | CHG000000233933 |    1     |    1
9   | CHG000000233934 |    1     |    1
10  | CHG000000233997 |    1     |    1
11  | CHG000000233998 |    1     |    1
12  | CHG000000233999 |    1     |    1
13  | CHG000000234001 |    1     |    1
14  | CHG000000234005 |    1     |    1
15  | CHG000000234009 |    1     |    1
16  | CHG000000234012 |    1     |    1
17  | CHG000000234693 |    1     |    1
18  | CHG000000234696 |    1     |    1
19  | CHG000000234730 |    1     |    1
20  | CHG000000234839 |    1     |    1
21  | CHG000000235115 |    1     |    1
22  | CHG000000235224 |    1     |    1
23  | CHG000000235488 |    1     |    1
24  | CHG000000235847 |    1     |    1

The first row is filtered out properly if I include the HAVING clause.

Dave 2009-06-18 17:45:01

Given that there are only 23 rows, can you include the results of that query?

Steve Broberg 2009-06-18 18:09:45

See answer post below.

Dave 2009-06-18 18:38:59

Or just this edited post...still learning how to use this site.

Dave 2009-06-18 18:44:16

See my answer below

Steve Broberg 2009-06-18 18:53:07

Answer 5

+2 A:

I understand now. Your problem in the original query is that it is highly unusual (if not, in fact, wrong) to use a HAVING clause without a GROUP BY clause. The answer lies in the order of operation the various parts of the query are performed.

In the original query, you do this:

SELECT COUNT(DISTINCT t442.c1)
  FROM ...
 WHERE ...
HAVING COUNT(DISTINCT t631.c536870922) = COUNT(DISTINCT t649.c536870931);

The database will perform your joins and constraints, at which point it would do any group by and aggregation operations. In this case, you are not grouping, so the COUNT operations are across the whole data set. Based on the values you posted above, COUNT(DISTINCT t631.c536870922) = 25 and COUNT(DISTINCT t649.c536870931) = 24. The HAVING clause now gets applied, resulting in nothing matching - your asking for cases where the count of the total set (even though there are multiple c1s) are equal, and they are not. The DISTINCT gets applied to an empty result set, and you get nothing.

What you really want to do is just a version of what you posted in the example that spit out the rows counts:

SELECT count(*)
  FROM (SELECT t442.c1     
          FROM t442
             , t658
             , t631
             , t649
         WHERE t442.c1 = t658.c536870930
           AND t442.c200000003 = 'Network'
           AND t442.c536871139 < 2
           AND t631.c536870913 = t442.c1
           AND t658.c536870925 = 1
           AND (   t442.c7 = 6
                OR t442.c7 = 5)
           AND t442.c536870954 > 1141300800
           AND (t442.c240000010 = 0)
           AND t442.c1 = t649.c536870914
           AND t649.c536870939 > 1
         GROUP BY t442.c1
        HAVING COUNT(DISTINCT t631.c536870922) = COUNT(DISTINCT t649.c536870931)
       );

This will give you a list of the c1 columns that have equal numbers of the 631 & 649 table entries. Note: You should be very careful about the use of DISTINCT in your queries. For example, in the case where you posted the results above, it is completely unnecessary; oftentimes it acts as a kind of wallpaper to cover over errors in queries that don't return results the way you want due to a missed constraint in the WHERE clause ("Hmm, my query is returning dupes for all these values. Well, a DISTINCT will fix that problem").

Steve Broberg 2009-06-18 18:52:41

edit: fixed my final query to return the count you were originally looking for.

Steve Broberg 2009-06-18 19:07:12

Ok, I think I understand. It does look like the places where the query was working (different set of WHERE clauses on those) the result set was identical with or without the HAVING.The query you just provided gives the individual rows and not just the count, but if I wrap that query in a SELECT COUNT(*) FROM (query) I get the results I need.

Dave 2009-06-18 19:12:28

Hah. Beat me to it. :-)

Dave 2009-06-18 19:13:10

+1, great job. This one was bothering me. I just couldn't figure it out. Thanks for the lesson :D

northpole 2009-06-18 19:19:21

+1 Don't use HAVING without GROUP BY.

Carl Manaster 2009-06-18 19:19:30

ansaurus

tags:

views:

answers:

Null value returned on Count Distinct (pl sql)

related questions