tags:

views:

1494

answers:

12

I have a table that has a processed_timestamp column -- if a record has been processed then that field contains the datetime it was processed, otherwise it is null.

I want to write a query that returns two rows:

NULL        xx -- count of records with null timestamps
NOT NULL    yy -- count of records with non-null timestamps

Is that possible?

Update: The table is quite large, so efficiency is important. I could just run two queries to calculate each total separately, but I want to avoid hitting the table twice if I can avoid it.

+16  A: 

In MySQL you could do something like

SELECT 
    IF(ISNULL(processed_timestamp), 'NULL', 'NOT NULL') as myfield, 
    COUNT(*) 
FROM mytable 
GROUP BY myfield
Stefan Gehrig
+2  A: 

If it's oracle then you can do:

select decode(field,NULL,'NULL','NOT NULL'), count(*)
from table
group by decode(field,NULL,'NULL','NOT NULL');

I'm sure that other DBs allow for similar trick.

ADEpt
+9  A: 

Try the following, it's vendor-neutral:

select
    'null    ' as type,
    count(*)   as quant
    from       tbl
    where      tmstmp is null
union all
select
    'not null' as type,
    count(*)   as quant
    from       tbl
    where      tmstmp is not null

After having our local DB2 guru look at this, he concurs: none of the solutions presented to date (including this one) can avoid a full table scan (of the table if timestamp is not indexed, or of the indexotherwise). They all scan every record in the table exactly once.

All the CASE/IF/NVL2() solutions do a null-to-string conversion for each row, introducing unnecessary load on the DBMS. This solution does not have that problem.

paxdiablo
This is a pretty large table -- hitting it twice like this is inefficient, no?
Stewart Johnson
No, actually (at least in DB2 which is the DB I use), this solution will be as fast as all the decode/nvl2-type ones - they all have to perform a full table scan (my solution will process the same number of records overall but in two groups) - index on timestamp field reqd in both cases.
paxdiablo
It'll be interesting to try this solution side-by-side with a vendor-specific one when I get to work tomorrow.
Stewart Johnson
I came here to post this solution, but Pax Diablo beat me to it. All the other solutions rely on converting the column to a string, which you are then counting. In this case, chances are you never even have to touch any rows, because all the information you care about is in the index.
Andy Lester
@Pax: I just came here by accident and noticed that this answer had a down-vote from me (I have no idea why I should have done this). Curiously, I was able to revert it to +1, even though it should have been much too old. Strange.
Tomalak
No probs, @Tomalak, I've noticed that before. You can't cancel a vote after a certain amount of time unless the Q/A is edited. But you can change your vote (why?, only the powers that be can answer that). Usually, if someone fixes a problem I downvoted for (but not enough to deserve an upvote), I just edit the Q/A a little so that I can cancel the vote.
paxdiablo
@Pax: Then the down-vote must have been cast before revision #4. I still have no idea why I had cast it in the first place. At any rate - this was not at all meant to be a subtle way of asking for an up-vote from you. But thanks anyway. ;-)
Tomalak
+8  A: 

Oracle:

group by nvl2(field, 'NOT NULL', 'NULL')

bwalliser
Sweet -- that's a nifty custom function. (More here: http://www.java2s.com/Code/Oracle/Char-Functions/NVL2xvalue1value2Returnsvalue1ifxisnotnullifxisnullvalue2isreturned.htm )
Stewart Johnson
+15  A: 

In T-SQL (MS SQL Server), this works:

SELECT
  CASE WHEN Field IS NULL THEN 'NULL' ELSE 'NOT NULL' END FieldContent,
  COUNT(*) FieldCount
FROM
  TheTable
GROUP BY
  CASE WHEN Field IS NULL THEN 'NULL' ELSE 'NOT NULL' END
Tomalak
A: 

Another MySQL method is to use the CASE operator, which can be generalised to more alternatives than IF():

SELECT CASE WHEN processed_timestamp IS NULL THEN 'NULL' 
            ELSE 'NOT NULL' END AS a,
       COUNT(*) AS n 
       FROM logs 
       GROUP BY a
Tom
IF() function works as well - if(processed_timestamp is null, 'null', 'not null')
Nouveau
A: 

I personally like Pax's solution, but if you absolutely require only one row returned (as I had recently), In MS SQL Server 2005/2008 you can "stack" the two queries using a CTE

with NullRows (countOf)
AS
(
    SELECT count(*) 
    FORM table 
    WHERE [processed_timestamp] IS NOT NULL
)
SELECT count(*) AS nulls, countOf
FROM table, NullRows
WHERE [processed_timestamp] IS NULL
GROUP BY countOf

Hope this helps

James Green
But then you're hitting the database twice -- inefficient. (Which must be why Pax deleted his solution.)
Stewart Johnson
Pretty big approcach for such a simple problem, isn't it?
Tomalak
You're only hitting the database twice if your database doesn't optimise it. Probably a safe assumption, but an assumption nonetheless.
Tanktalus
Pax deleted his solution because it started getting downvotes, despite being the only non-vendor specific solution :-). Probably better to have a comprehensive list of all the vendor-specific optimized solutions and readers can choose which one they want.
paxdiablo
Actually, I'll put it back and take the hits - interesting to see how many downvotes it gets....
paxdiablo
Well then this solution should get extra downvotes because it's inefficient AND vendor specific. There's another T-SQL solution that doesn't hit the database twice.
Stewart Johnson
@Stewart Johnson. CTE's *are* ANSI SQL:1999! Also, All aggregate functions (of which count(*) is one) give a scalar result for the set they are given, in this case rows with, then rows without, values for that column.
James Green
A: 

[T-SQL]:

select [case], count(*) tally
from (
  select 
  case when [processed_timestamp] is null then 'null'
  else 'not null'
  end [case]
  from myTable
) a

And you can add into the case statement whatever other values you'd like to form a partition, e.g. today, yesterday, between noon and 2pm, after 6pm on a Thursday.

Unsliced
+1  A: 

Stewart,

Maybe consider this solution. It is (also!) vendor non-specific.

SELECT count([processed_timestamp]) AS notnullrows, 
       count(*) - count([processed_timestamp]) AS nullrows 
FROM table

As for efficiency, this avoids 2x index seeks/table scans/whatever by including the results on one row. If you absolutely require 2 rows in the result, two passes over the set may be unavoidable because of unioning aggregates.

Hope this helps

James Green
A: 
Select Sum(Case When processed_timestamp IS NULL
                         Then 1
                         Else 0
                 End)                                                               not_processed_count,
          Sum(Case When processed_timestamp Is Not NULL
                         Then 1
                         Else 0
                 End)                                                               processed_count,
          Count(1)                                                                total
From table

Edit: didn't read carefully, this one returns a single row.

Aleksey Otrubennikov
A: 

In Oracle

SELECT COUNT(*), COUNT(TIME_STAMP_COLUMN)
FROM TABLE;

count(*) returns the count of all rows

count(column_name) returns the number of rows which are not NULL, so

SELECT COUNT(*) - COUNT(TIME_STAMP_COLUMN) NUL_COUNT,
                  COUNT(TIME_STAMP_COLUMN) NON_NUL_COUNT
FROM TABLE

ought to do the job.

If the column is indexed, you might end up with some sort of range scan and avoid actually reading the table.

EvilTeach
A: 

If your database has an efficient COUNT(*) function for a table, you could COUNT whichever is the smaller number, and subtract.

le dorfier