tags:

views:

395

answers:

3

[This is on an iSeries/DB2 database if that makes any difference]

I want to write a procedure to identify columns that are left as blank or zero (given a list of tables).

Assuming I can pull out table and column definitions from the central system tables, how should I check the above condition? My first guess is for each column generate a statement dynamically such as:

select count(*) from my_table where my_column != 0

and to check if this returns zero rows, but is there a better/faster/standard way to do this?

NB This just needs to handle simple character, integer/decimal fields, nothing fancy!

A: 

I'm assuming you mean you want to know if there are any values in all the rows of a given column. If your column can have "blanks" you're probably going to need to add an OR NOT NULL to your WHERE clause to get the correct answer.

Neil D
+2  A: 

Yes, typically, I would do something like this in SQL Server:

SELECT
REPLACE(REPLACE(REPLACE(
'
    SELECT COUNT(*) AS [COUNT NON-EMPTY IN {TABLE_NAME}.{COLUMN_NAME}]
    FROM [{TABLE_SCHEMA}].[{TABLE_NAME}]
    WHERE [{COLUMN_NAME}] IS NOT NULL
     OR [{COLUMN_NAME}] <> 0
'
, '{TABLE_SCHEMA}', c.TABLE_SCHEMA)
, '{TABLE_NAME}', c.TABLE_NAME)
, '{COLUMN_NAME}', c.COLUMN_NAME) AS [SQL]
FROM INFORMATION_SCHEMA.COLUMNS c
INNER JOIN INFORMATION_SCHEMA.TABLES t
    ON t.TABLE_TYPE = 'BASE TABLE'
    AND c.TABLE_CATALOG = t.TABLE_CATALOG
    AND c.TABLE_SCHEMA = t.TABLE_SCHEMA
    AND c.TABLE_NAME = t.TABLE_NAME
    AND c.DATA_TYPE = 'int'

You can get a lot fancier by doing UNIONs of the entire query and checking the IS_NULLABLE on each column and obviously you might have different requirements for different data types, and skipping identity columns, etc.

Cade Roux
+3  A: 

To check for columns that contain only NULLs on DB2:

  1. Execute RUNSTATS on your database (http://www.ibm.com/developerworks/data/library/techarticle/dm-0412pay/)
  2. Check the database statistics by quering SYSSTAT.TABLES and SYSSTAT.COLUMNS . Comparing SYSSTAT.TABLES.CARD and SYSSTAT.COLUMNS.NUMNULLS will tell you what you need. An example could be:

 select t.tabschema, t.tabname, c.colname 
 from sysstat.tables t, sysstat.columns c 
 where ((t.tabschema = 'MYSCHEMA1' and t.tabname='MYTABLE1') or 
        (t.tabschema = 'MYSCHEMA2' and t.tabname='MYTABLE2') or 
        (...)) and 
      t.tabschema = c.tabschema and t.tabname = c.tabname and
      t.card = c.numnulls

More on system stats e.g. here: http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r0001070.htm and http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r0001073.htm

Similarly, you can use SYSSTAT.COLUMNS.AVGCOLLEN to check for empty columns (just it doesn't seem to work for LOBs).

EDIT: And, to check for columns that contain only zeros, use try comparing HIGH2KEY and LOW2KEY in SYSSTAT.COLUMNS.

david a.