ansaurus

Question

How do I return my records grouped by NULL and NOT NULL?

Answer 1

+16 A:

In MySQL you could do something like

SELECT 
    IF(ISNULL(processed_timestamp), 'NULL', 'NOT NULL') as myfield, 
    COUNT(*) 
FROM mytable 
GROUP BY myfield

Stefan Gehrig 2008-10-27 10:53:03

Answer 2

+2 A:

If it's oracle then you can do:

select decode(field,NULL,'NULL','NOT NULL'), count(*)
from table
group by decode(field,NULL,'NULL','NOT NULL');

I'm sure that other DBs allow for similar trick.

ADEpt 2008-10-27 10:53:51

Answer 3

+9 A:

Try the following, it's vendor-neutral:

select
    'null    ' as type,
    count(*)   as quant
    from       tbl
    where      tmstmp is null
union all
select
    'not null' as type,
    count(*)   as quant
    from       tbl
    where      tmstmp is not null

After having our local DB2 guru look at this, he concurs: none of the solutions presented to date (including this one) can avoid a full table scan (of the table if timestamp is not indexed, or of the indexotherwise). They all scan every record in the table exactly once.

All the CASE/IF/NVL2() solutions do a null-to-string conversion for each row, introducing unnecessary load on the DBMS. This solution does not have that problem.

paxdiablo 2008-10-27 10:54:19

This is a pretty large table -- hitting it twice like this is inefficient, no?

Stewart Johnson 2008-10-27 10:55:32

No, actually (at least in DB2 which is the DB I use), this solution will be as fast as all the decode/nvl2-type ones - they all have to perform a full table scan (my solution will process the same number of records overall but in two groups) - index on timestamp field reqd in both cases.

paxdiablo 2008-10-27 11:29:25

It'll be interesting to try this solution side-by-side with a vendor-specific one when I get to work tomorrow.

Stewart Johnson 2008-10-27 12:10:41

I came here to post this solution, but Pax Diablo beat me to it. All the other solutions rely on converting the column to a string, which you are then counting. In this case, chances are you never even have to touch any rows, because all the information you care about is in the index.

Andy Lester 2008-10-27 12:54:56

@Pax: I just came here by accident and noticed that this answer had a down-vote from me (I have no idea why I should have done this). Curiously, I was able to revert it to +1, even though it should have been much too old. Strange.

Tomalak 2009-08-10 06:48:59

No probs, @Tomalak, I've noticed that before. You can't cancel a vote after a certain amount of time unless the Q/A is edited. But you can change your vote (why?, only the powers that be can answer that). Usually, if someone fixes a problem I downvoted for (but not enough to deserve an upvote), I just edit the Q/A a little so that I can cancel the vote.

paxdiablo 2009-08-10 07:09:43

@Pax: Then the down-vote must have been cast before revision #4. I still have no idea why I had cast it in the first place. At any rate - this was not at all meant to be a subtle way of asking for an up-vote from you. But thanks anyway. ;-)

Tomalak 2009-08-10 09:07:34

Answer 4

+8 A:

Oracle:

group by nvl2(field, 'NOT NULL', 'NULL')

bwalliser 2008-10-27 10:56:10

Sweet -- that's a nifty custom function. (More here: http://www.java2s.com/Code/Oracle/Char-Functions/NVL2xvalue1value2Returnsvalue1ifxisnotnullifxisnullvalue2isreturned.htm )

Stewart Johnson 2008-10-27 11:00:20

Answer 5

+15 A:

In T-SQL (MS SQL Server), this works:

SELECT
  CASE WHEN Field IS NULL THEN 'NULL' ELSE 'NOT NULL' END FieldContent,
  COUNT(*) FieldCount
FROM
  TheTable
GROUP BY
  CASE WHEN Field IS NULL THEN 'NULL' ELSE 'NOT NULL' END

Tomalak 2008-10-27 10:57:11

Answer 6

A:

Another MySQL method is to use the CASE operator, which can be generalised to more alternatives than IF():

SELECT CASE WHEN processed_timestamp IS NULL THEN 'NULL' 
            ELSE 'NOT NULL' END AS a,
       COUNT(*) AS n 
       FROM logs 
       GROUP BY a

Tom 2008-10-27 11:03:07

IF() function works as well - if(processed_timestamp is null, 'null', 'not null')

Nouveau 2008-10-27 11:05:06

Answer 7

A:

I personally like Pax's solution, but if you absolutely require only one row returned (as I had recently), In MS SQL Server 2005/2008 you can "stack" the two queries using a CTE

with NullRows (countOf)
AS
(
    SELECT count(*) 
    FORM table 
    WHERE [processed_timestamp] IS NOT NULL
)
SELECT count(*) AS nulls, countOf
FROM table, NullRows
WHERE [processed_timestamp] IS NULL
GROUP BY countOf

Hope this helps

James Green 2008-10-27 11:05:48

But then you're hitting the database twice -- inefficient. (Which must be why Pax deleted his solution.)

Stewart Johnson 2008-10-27 11:06:52

Pretty big approcach for such a simple problem, isn't it?

Tomalak 2008-10-27 11:07:09

You're only hitting the database twice if your database doesn't optimise it. Probably a safe assumption, but an assumption nonetheless.

Tanktalus 2008-10-27 11:15:14

Pax deleted his solution because it started getting downvotes, despite being the only non-vendor specific solution :-). Probably better to have a comprehensive list of all the vendor-specific optimized solutions and readers can choose which one they want.

paxdiablo 2008-10-27 11:15:26

Actually, I'll put it back and take the hits - interesting to see how many downvotes it gets....

paxdiablo 2008-10-27 11:17:26

Well then this solution should get extra downvotes because it's inefficient AND vendor specific. There's another T-SQL solution that doesn't hit the database twice.

Stewart Johnson 2008-10-27 11:20:50

@Stewart Johnson. CTE's *are* ANSI SQL:1999! Also, All aggregate functions (of which count(*) is one) give a scalar result for the set they are given, in this case rows with, then rows without, values for that column.

James Green 2008-10-27 13:15:50

Answer 8

A:

[T-SQL]:

select [case], count(*) tally
from (
  select 
  case when [processed_timestamp] is null then 'null'
  else 'not null'
  end [case]
  from myTable
) a

And you can add into the case statement whatever other values you'd like to form a partition, e.g. today, yesterday, between noon and 2pm, after 6pm on a Thursday.

Unsliced 2008-10-27 12:36:27

Answer 9

+1 A:

Stewart,

Maybe consider this solution. It is (also!) vendor non-specific.

SELECT count([processed_timestamp]) AS notnullrows, 
       count(*) - count([processed_timestamp]) AS nullrows 
FROM table

As for efficiency, this avoids 2x index seeks/table scans/whatever by including the results on one row. If you absolutely require 2 rows in the result, two passes over the set may be unavoidable because of unioning aggregates.

Hope this helps

James Green 2008-10-27 13:20:12

Answer 10

A:

Select Sum(Case When processed_timestamp IS NULL
                         Then 1
                         Else 0
                 End)                                                               not_processed_count,
          Sum(Case When processed_timestamp Is Not NULL
                         Then 1
                         Else 0
                 End)                                                               processed_count,
          Count(1)                                                                total
From table

Edit: didn't read carefully, this one returns a single row.

Aleksey Otrubennikov 2008-10-27 13:56:27

Answer 11

A:

In Oracle

SELECT COUNT(*), COUNT(TIME_STAMP_COLUMN)
FROM TABLE;

count(*) returns the count of all rows

count(column_name) returns the number of rows which are not NULL, so

SELECT COUNT(*) - COUNT(TIME_STAMP_COLUMN) NUL_COUNT,
                  COUNT(TIME_STAMP_COLUMN) NON_NUL_COUNT
FROM TABLE

ought to do the job.

If the column is indexed, you might end up with some sort of range scan and avoid actually reading the table.

EvilTeach 2008-10-28 02:48:30

Answer 12

A:

If your database has an efficient COUNT(*) function for a table, you could COUNT whichever is the smaller number, and subtract.

le dorfier 2008-10-28 02:49:34

ansaurus

tags:

views:

answers:

How do I return my records grouped by NULL and NOT NULL?

related questions