tags:

views:

371

answers:

5

Just wondering, is there any quick way to count all the NULL values (from all columns) in a MySQL table?

Thanks for any idea!

A: 
SELECT COUNT(*) FROM yourTable WHERE yourField IS NULL;
e4c5
But that is only for one column. Not all the null values in the **table**.
klausbyskov
I think the OP wants ALL NULL values ... not only from a particular column. I wonder what this could be practical about.
anthares
@anthares: I am generating a report for our contact database. Where all the columns for a contact is preferred but not mandatory, I just want to generate a quick report on the completion percentage of contact information in the entire table. That's the reason I am asking for this. To the original answer, yes, I want to count from all the columns and not from one. Any help please?
Nirmal
If you have a table where all the columns contain null values, that is an good indication that the indexes and/or nomalization needs to be looked at. Even so.If you have rows with all nulls, it just takes a little bit of typing.SELECT COUNT(*) FROM yourTable WHERE yourField1 IS NULL AND yourField2 IS NULL AND yourField3 IS NULL AND ...;
e4c5
Check my solution below for a solution for all columns.
Pentium10
+4  A: 

Something like

select id
       , sum ( case when col1 is null then 1 else 0 end case ) col1
       , sum ( case when col2 is null then 1 else 0 end case ) col2
       , sum ( case when col3 is null then 1 else 0 end case ) col3
from contacts
group by id
APC
This involves knowing the number of columns for that table, on a large table it's not productive. Check my solution.
Pentium10
@Pentium - what you say is true there are ways to around that. For a one-off query a simple regex against the cut'n'pasted output of a DESCRIBE would be enough. Or the query could be generated from the INFORMATION_SCHEMA.
APC
A: 

You should really do this using not only SQL, but the language which is at your disposal:

  1. Obtain the metadata of each table - either using DESCRIBE table, or using a built-in metadata functionality in your db access technology

  2. Create queries of the following type in a loop for each column. (in pseudo-code)

    int nulls = 0;
    for (String colmnName : columNames) {
        query = "SELECT COUNT(*) FROM tableName WHERE " + columnName + " IS NULL";
        Result result = executeQuery(query);
        nulls += result.size();
    }
    
Bozho
This will execute the select query for each column, which can be hundreds on a large table. You can get the metadata of each table and use that in a SQL to get the null values in columns, check out my solution.
Pentium10
it appears to me that he is doing one-time statistics, so he can let it run for a whole day if needed :)
Bozho
A: 

Something like this (substitute COL_COUNT as appropriate):

select count(*) * COL_COUNT - count(col1) - count(col2) - ... - count(col_n) from table;
janm
This involves knowing the number of columns for that table, on a large table it's not productive. Check my solution.
Pentium10
+2  A: 

If you want this done exclusively by MYSQL and without enumerating all of the columns take a look at this solution.

In this method you don't have to maintain the number of database columns by hard coding them. If your table schema will get modified this method will work, and won't require code change.

SET @db = 'testing'; -- database
SET @tb = 'fuzzysearch'; -- table
SET @x = ''; -- will hold the column names with ASCII method applied to retrieve the number of the first char
SET @numcolumns = 0; -- will hold the number of columns in the table

-- figure out how many columns we have
SELECT count(*) into @numcolumns FROM information_schema.columns where table_name=@tb and table_schema=@db;

-- we have to prepare some query from all columns of the table
SELECT group_concat(CONCAT('ASCII(',column_name,')') SEPARATOR ",") into @x from information_schema.columns where table_name=@tb and table_schema=@db;
-- after this query we have a variable separated with comma like
-- ASCII(col1),ASCII(col2),ASCII(col3)

-- we now generate a query to concat the columns using comma as separator (null values are omitted from concat)
-- then figgure out how many times the comma is in that substring (this is done by using length(value)-length(replace(value,',',''))
-- the number returned is how many non null columns we have in that column
-- then we deduct the number from the known number of columns, calculated previously
-- the +1 is added because there is no comma for single value
SET @s = CONCAT('SELECT @numcolumns - (length(CONCAT_WS(\',\',', @x, '))-length(replace(CONCAT_WS(\',\',', @x, '),\',\',\'\')) + 1) FROM ',@db,'.',@tb,';');
PREPARE stmt FROM @s;
EXECUTE stmt;
-- after this execution we have returned for each row the number of null columns
-- I will leave to you to add a sum() group call if you want to find the null values for the whole table
DEALLOCATE PREPARE stmt;

The ASCII is used to avoid reading, concatenating very long columns for nothing, also ASCII makes us safe for values where the first char is a comma(,).

Since you are working with reports, you may find this helpful as this can be reused for each table if you put in a method.

I tried to let as many comments as possible.

Let's split on pieces the above compact way (reverse way):

I wanted to end up having a query like this

SELECT totalcolumns - notnullcolumns from table; -- to return null columns for each row

While the first one is easy to calcule by running:

SELECT count(*) FROM information_schema.columns where table_name=@tb and table_schema=@db;

The second one the notnullcolumns is a bit of pain. After a piece of examination of the functions available in MySQL, we detect that CONCAT_WS does not CONCAT null values

So running a query like this:

SELECT CONCAT_WS(",","First name",NULL,"Last Name");
returns: 'First name,Last Name'

This is good, we take rid of the null values from the enumeration. But how do we get how many columns were actually concatenated?

Well that is tricky. We have to calculate the number of commas+1 to get the actually concatenated columns.

For this trick we used the following SQL notation

select length(value)-length(replace(value,',','')) +1 from table

Ok, so we have now the number of concatenated columns.

But the harder part is coming next.

We have to enumerate for CONCAT_WS() all values.
We need to have something like this:

SELECT CONCAT_WS(",",col1,col2,col3,col4,col5);

This is where we have to take use of the prepared statements, as we have to prepare an SQL query dynamically from yet unknown columns. We don't know how many columns will be in our table.

So for this we use data from information_schema columns table. We need to pass the table name, but also the database name, as we might have the same table name in separate databases.

We need a query that returns col1,col2,col3,col4,col5 to us on the CONCAT_WS "string"

So for this we run a query

SELECT group_concat(column_name SEPARATOR ",") into @x from information_schema.columns where table_name=@tb and table_schema=@db;

One more thing to mention. When we used the length() and replace() method to find out how many columns were concatenated, we have to make sure we do not have commas among the values. But also take note that we can have really long values in our database cells. For both of this trick we use method ASCII('value'), which will return the ASCII char of the first char, which cannot be comma and will return null for null columns.

That being said we can compact all this in the above comprehensive solution.

Pentium10
+1, although this is a bit hard to understand and maintain
Bozho
I added more comments and this method doesn't need maintenance, if he alters the table schema it will work without code change. Thus can be reused for other causes.
Pentium10
That was really long and wonderful, and solved the issue I had. Any number on the performance of this function? Thanks!
Nirmal
I had to add the `SET group_concat_max_len = 2048;` at the start to ensure that GROUP_CONCAT accommodates the numerous fields.
Nirmal
HUH you have a really big table, how many columns you have. I don't have any number on the performance, but knowing you use for report that can take a bit longer than a response for an interface.
Pentium10
74 columns with moderately lengthier column names. You are correct. I will be running the aggregation as a cron and performance is not an issue. Thanks again.
Nirmal