ansaurus

Question

How to replace leading null values with the first non-null value in a row?

Answer 1

+1 A:

You should probably re-do your schema. Whenever you have something that looks like an array in a single row, it's often better to split it out to separate rows.

But, assuming you're stuck with the current schema, I'd just go for a simple:

start transaction;
update TBL set SP8 =   0 where SP8 is null;
update TBL set SP7 = SP8 where SP7 is null;
update TBL set SP6 = SP7 where SP6 is null;
update TBL set SP5 = SP6 where SP5 is null;
update TBL set SP4 = SP5 where SP4 is null;
update TBL set SP3 = SP4 where SP3 is null;
update TBL set SP2 = SP3 where SP2 is null;
update TBL set SP1 = SP2 where SP1 is null;
commit;

(substituting in the correct SQL Server transaction syntax if need be).

If you don't want to actually change the underlying data, you can use a view, but it's likely to be hideous, and you may want to opt for doing the transformation in whatever application you're using to execute the SQL instead.

One possibility, but I strongly urge you not to do this, and there may be a better vendor-specific way:

create a view view8 over the table which leaves all columns untouched except for SP8 which becomes coalesce(sp8,0) (or whatever the SQL Server equivaluent is - SP8 if it's not NULL otherwise 0).
create a view view7 over the view view8 which leaves all columns untouched except for SP7 which becomes coalesce(sp7,sp8).
create a view view6 over the view view7 which leaves all columns untouched except for SP6 which becomes coalesce(sp6,sp7).
blah, blah, blah.
create a view view1 over the view view2 which leaves all columns untouched except for SP1 which becomes coalesce(sp1,sp2).
use the view view1.

As I said, a massive kludge and please, for the love of whatever gods you believe in, don't use it. But sometimes needs dictate our actions so I'm putting it out there just in case.

All care, no responsibility, test (and profile) it yourself.

And, having posted that and discovered Damien has a more compact version, I also like to offer the following.

It's sometimes useful to sacrifice space for time (make things faster at the expense of more disk space taken up).

You can create another 8 columns, MORPHSP1 through MORPHSP8, to store the morphed values that I suggested in my first solution.

This would normally violate 3NF but that's actually okay if you do two things: (1) understand the ramifications; and (2) mitigate the chance of inconsistent data.

By the use of insert/update triggers, you can actually guarantee that the data will remain consistent.

Have your trigger do the following whenever a row changes.

set MORPHSP8 to coalesce (SP8,0)
set MORPHSP7 to coalesce (SP7,MORPHSP8)
set MORPHSP6 to coalesce (SP6,MORPHSP7)
set MORPHSP5 to coalesce (SP5,MORPHSP6)
set MORPHSP4 to coalesce (SP4,MORPHSP5)
set MORPHSP3 to coalesce (SP3,MORPHSP4)
set MORPHSP2 to coalesce (SP2,MORPHSP3)
set MORPHSP1 to coalesce (SP1,MORPHSP2)

That way, you only incur the cost when the data changes, not every single time you use the data. On a table where reads outnumber writes (and that's the vast majority), this can lead to an impressive performance improvement.

paxdiablo 2010-08-11 07:39:41

Erm...the schema doesn't look like this. In fact I've spent efforts to dynamically transform rows into columns to make it like this. The format is determined by the report I'm going to make.

phoenies 2010-08-11 07:55:49

Well, I gather this is for reporting purposes rather than an actual table structure. I'm guessing that the data has already been pivoted, when it may have been more appropriate to fix the data before the pivot happened. As to the views/COALESCE, it's not too horrific in this case (see my answer), bearing in mind that COALESCE can take multiple expressions (not just 2).

Damien_The_Unbeliever 2010-08-11 07:56:10

My advice is to not use SQL to generate reports. It's a relational algebra meant for creating data sets and it's usually far better to use it _just_ for getting data. Generate your reports once you have the data.

paxdiablo 2010-08-11 08:02:09

@Damien How to fix the data before the pivot happened? Each null value is then a missing row of data.

phoenies 2010-08-11 08:02:13

@paxdiablo Then I have to use VBA in Excel. I think SQL is more accurate and efficient.

phoenies 2010-08-11 08:05:09

Don't unnecessarily discount VBA - it can be plenty efficient :-) You could load _all_ those values up into a sheet then create and populate a huge range of `=if(Sheet1!a8="",0,Sheet1!a8)`-type cells on `Sheet2` _very_ quickly. Yes, VBA _can_ be slow but only when under the control of a novice using `for` loops to do their work.

paxdiablo 2010-08-11 08:14:21

@paxdiablo I've done a lot of VBA programming in the last half year. My experience is, that it's never quick when you write something to worksheets. There may be events, re-calculation and re-drawing as well.

phoenies 2010-08-11 08:26:57

Thank you too for your advice and patience. :)

phoenies 2010-08-11 08:31:32

You can turn off re-drawing during script runs but I'm not sure that would be an issue here. Have your script destroy sheet2. Requery on sheet1. Work out the size of the data on sheet1. Recreate sheet2. Turn on redraw. Blat you formulae onto the proper range based on the sheet1 data height (two blats, column 8 then columns 1-7 since column 8 is special, no column 9 to draw info from). Then turn on redraw. I have spreadsheets that do this for a several hundred rows and the work is sub-second. But still, if Damiens solution adds so little overhead, _that's_ the one I'd choose for this question.

paxdiablo 2010-08-11 08:33:10

I've marked this CW since it's a worse answer than Damien's, I'm just leaving it here for reference.

paxdiablo 2010-08-11 08:33:57

Answer 2

+2 A:

SELECT
    ID,
    COALESCE(SP1,SP2,SP3,SP4,SP5,SP6,SP7,SP8,0) as SP1,
    COALESCE(SP2,SP3,SP4,SP5,SP6,SP7,SP8,0) as SP2,
    COALESCE(SP3,SP4,SP5,SP6,SP7,SP8,0) as SP3,
    COALESCE(SP4,SP5,SP6,SP7,SP8,0) as SP4,
    COALESCE(SP5,SP6,SP7,SP8,0) as SP5,
    COALESCE(SP6,SP7,SP8,0) as SP6,
    COALESCE(SP7,SP8,0) as SP7,
    COALESCE(SP8,0) as SP8
FROM
    (<your existing query>) t

COALESCE takes a number of expressions, and returns the first non-null value.

Damien_The_Unbeliever 2010-08-11 07:46:41

Actually this is somewhat better than my multi-view option but, please, still check the performance. Per-row functions never scale well.

paxdiablo 2010-08-11 07:53:25

I just profiled against 100000 rows of sample data (far higher than I'd expect for a report) and it seems to add a 2% overhead, compared to a plain select query from a table. Obviously, if there's any more complication to the base query (which we already know there is), we'd expect this overhead to reduce. YMMV.

Damien_The_Unbeliever 2010-08-11 08:11:01

WOW, I didn't know there's such a function. Great thanks.

phoenies 2010-08-11 08:12:54

Then I'd go for this one. 2% doesn't seem too bad and you're right - if the underlying query is more complicated than the simple row extraction, the ratio should reduce even more.

paxdiablo 2010-08-11 08:18:25

ansaurus

tags:

views:

answers:

How to replace leading null values with the first non-null value in a row?

related questions