ansaurus

Question

Filtering SELECT statement by row with most populated columns

Answer 1

+1 A:

You could look into the Coalesce function but honestly I would probably do the Case statement as you have above.

What reason would you not want to use it?

According to the comments, the table is more than just 4 value fields. But the fear is that a temporary table keeping the highest count of nulls will be necessary.

I feel that the Case solution, potentially implemented into a view is still a viable and good solution.

Jeff Martin 2009-07-14 15:11:31

Firstly because in reality there are many more columns that 4!Secondly, I would like it to be a simple procedure rather than one involving a temporary table. Maybe there is a clean way to do it without making a temporary table and keeping the highest CountOfNulls per Reference?

PCurd 2009-07-14 15:15:10

don't really need a temp table, just create the view with the calculation in it, it should be a 1 time thing.

Jeff Martin 2009-07-14 16:02:39

Jeff is right - you've got the right technique, just don't use a temp table. This could easily be a view.

le dorfier 2009-07-14 17:36:00

Seems there isn't really a simple fix to my problem so I'm marking this as the "answer" as it (with the comments) is the best coverage.

PCurd 2009-07-16 10:52:15

Thanks for the answer, gonna move some of the salient comments into the answer for the search engines...

Jeff Martin 2009-07-16 14:57:00

Answer 2

+1 A:

This is just a hunch, since I haven't seen your database, but it looks like this is one table that should actually be two. As I said, this may have been done for performance or other legit reasons, but the way you need to report over it suggests this table should be split.

Nonetheless, if I had no other alternative, I'd create a column with the count of filled in columns for the row, and programmatically update it whenever the particular record is updated.

Bork Blatt 2009-07-14 15:30:15

Answer 3

A:

Well it's not a great way to do it, but I have put together this code and it works:

SELECT distinct Reference, Value1, Value2, Value3, Value4

FROM [tblHistory]
WHERE Reference+cast(4-(case when Value1 is null then 1 else 0 end 
+ case when Value2 is null then 1 else 0 end 
+ case when Value3 is null then 1 else 0 end
+ case when Value4 is null then 1 else 0 END) AS varchar) IN (

SELECT ro_custref + CAST(MAX(CountOfNonNulls) AS VARCHAR) FROM
(

SELECT ro_custref, 4-(case when Value1 is null then 1 else 0 end 
+ case when Value2 is null then 1 else 0 end 
+ case when Value3 is null then 1 else 0 end
+ case when Value4 is null then 1 else 0 end)
as CountOfNonNulls

FROM [tblHistory]
)l
GROUP BY Reference
)

I don't actually have this table structure so I haven't tested it but it seems to work. The idea is to make a "new" key by adding the highest CountOfNonNulls to the Reference field and using that to limit the select - it means the nasty CASE code is run twice but the other filters I have (not shown) limit the population to roughly 80 rows in my system so I can live with this.

I have yet to see what it would do if there were two rows with the same CountOfNonNulls value but different Value1-Value4 fields - I think it would break. In that case I would probably add the Value1-Value4 fields to my "new" key but that's a bit silly.

Any suggestions of improvement would be much appreciated!

PCurd 2009-07-14 16:11:04

Answer 4

A:


-- count() will not include NULL, so we can avoid making complex conditions
;
with
sum_cnt
(
    Reference,
    cnt
)
as
(
    select 
     Reference, 
     count(Value1) + count(Value2) + count(Value3) + count(Value4) 
    from 
     tblHistory 
    group by 
     Reference
)
select top 1
    Reference
from
    sum_cnt 
order by
    cnt desc

Irawan Soetomo 2009-07-14 17:33:01

That's a really neat solution - sadly I'm using SQL Server 2000 and can't use CTEs.

PCurd 2009-07-14 18:55:24

ansaurus

tags:

views:

answers:

Filtering SELECT statement by row with most populated columns

related questions