views:

52

answers:

2

I'll try to avoid describing the background here. I now have a query result (not a table) which contains rows like this:

ID      SP1     SP2     SP3     SP4     SP5    SP6     SP7     SP8
1       null    null    2500    1400    700    null    null    null

There may be leading and/or trailing null values around a section of non-null values (which represents a decreasing process actually). And what I want is like this:

ID      SP1     SP2     SP3     SP4     SP5    SP6     SP7     SP8
1       2500    2500    2500    1400    700    0       0       0

That means, to replace leading nulls with the first non-null value, and trailing nulls with 0.
Please advise. I'm working on SQL Server 2000.

+1  A: 

You should probably re-do your schema. Whenever you have something that looks like an array in a single row, it's often better to split it out to separate rows.

But, assuming you're stuck with the current schema, I'd just go for a simple:

start transaction;
update TBL set SP8 =   0 where SP8 is null;
update TBL set SP7 = SP8 where SP7 is null;
update TBL set SP6 = SP7 where SP6 is null;
update TBL set SP5 = SP6 where SP5 is null;
update TBL set SP4 = SP5 where SP4 is null;
update TBL set SP3 = SP4 where SP3 is null;
update TBL set SP2 = SP3 where SP2 is null;
update TBL set SP1 = SP2 where SP1 is null;
commit;

(substituting in the correct SQL Server transaction syntax if need be).

If you don't want to actually change the underlying data, you can use a view, but it's likely to be hideous, and you may want to opt for doing the transformation in whatever application you're using to execute the SQL instead.

One possibility, but I strongly urge you not to do this, and there may be a better vendor-specific way:

  • create a view view8 over the table which leaves all columns untouched except for SP8 which becomes coalesce(sp8,0) (or whatever the SQL Server equivaluent is - SP8 if it's not NULL otherwise 0).
  • create a view view7 over the view view8 which leaves all columns untouched except for SP7 which becomes coalesce(sp7,sp8).
  • create a view view6 over the view view7 which leaves all columns untouched except for SP6 which becomes coalesce(sp6,sp7).
  • blah, blah, blah.
  • create a view view1 over the view view2 which leaves all columns untouched except for SP1 which becomes coalesce(sp1,sp2).
  • use the view view1.

As I said, a massive kludge and please, for the love of whatever gods you believe in, don't use it. But sometimes needs dictate our actions so I'm putting it out there just in case.

All care, no responsibility, test (and profile) it yourself.


And, having posted that and discovered Damien has a more compact version, I also like to offer the following.

It's sometimes useful to sacrifice space for time (make things faster at the expense of more disk space taken up).

You can create another 8 columns, MORPHSP1 through MORPHSP8, to store the morphed values that I suggested in my first solution.

This would normally violate 3NF but that's actually okay if you do two things: (1) understand the ramifications; and (2) mitigate the chance of inconsistent data.

By the use of insert/update triggers, you can actually guarantee that the data will remain consistent.

Have your trigger do the following whenever a row changes.

set MORPHSP8 to coalesce (SP8,0)
set MORPHSP7 to coalesce (SP7,MORPHSP8)
set MORPHSP6 to coalesce (SP6,MORPHSP7)
set MORPHSP5 to coalesce (SP5,MORPHSP6)
set MORPHSP4 to coalesce (SP4,MORPHSP5)
set MORPHSP3 to coalesce (SP3,MORPHSP4)
set MORPHSP2 to coalesce (SP2,MORPHSP3)
set MORPHSP1 to coalesce (SP1,MORPHSP2)

That way, you only incur the cost when the data changes, not every single time you use the data. On a table where reads outnumber writes (and that's the vast majority), this can lead to an impressive performance improvement.

paxdiablo
Erm...the schema doesn't look like this. In fact I've spent efforts to dynamically transform rows into columns to make it like this. The format is determined by the report I'm going to make.
phoenies
Well, I gather this is for reporting purposes rather than an actual table structure. I'm guessing that the data has already been pivoted, when it may have been more appropriate to fix the data before the pivot happened. As to the views/COALESCE, it's not too horrific in this case (see my answer), bearing in mind that COALESCE can take multiple expressions (not just 2).
Damien_The_Unbeliever
My advice is to not use SQL to generate reports. It's a relational algebra meant for creating data sets and it's usually far better to use it _just_ for getting data. Generate your reports once you have the data.
paxdiablo
@Damien How to fix the data before the pivot happened? Each null value is then a missing row of data.
phoenies
@paxdiablo Then I have to use VBA in Excel. I think SQL is more accurate and efficient.
phoenies
Don't unnecessarily discount VBA - it can be plenty efficient :-) You could load _all_ those values up into a sheet then create and populate a huge range of `=if(Sheet1!a8="",0,Sheet1!a8)`-type cells on `Sheet2` _very_ quickly. Yes, VBA _can_ be slow but only when under the control of a novice using `for` loops to do their work.
paxdiablo
@paxdiablo I've done a lot of VBA programming in the last half year. My experience is, that it's never quick when you write something to worksheets. There may be events, re-calculation and re-drawing as well.
phoenies
Thank you too for your advice and patience. :)
phoenies
You can turn off re-drawing during script runs but I'm not sure that would be an issue here. Have your script destroy sheet2. Requery on sheet1. Work out the size of the data on sheet1. Recreate sheet2. Turn on redraw. Blat you formulae onto the proper range based on the sheet1 data height (two blats, column 8 then columns 1-7 since column 8 is special, no column 9 to draw info from). Then turn on redraw. I have spreadsheets that do this for a several hundred rows and the work is sub-second. But still, if Damiens solution adds so little overhead, _that's_ the one I'd choose for this question.
paxdiablo
I've marked this CW since it's a worse answer than Damien's, I'm just leaving it here for reference.
paxdiablo
+2  A: 
SELECT
    ID,
    COALESCE(SP1,SP2,SP3,SP4,SP5,SP6,SP7,SP8,0) as SP1,
    COALESCE(SP2,SP3,SP4,SP5,SP6,SP7,SP8,0) as SP2,
    COALESCE(SP3,SP4,SP5,SP6,SP7,SP8,0) as SP3,
    COALESCE(SP4,SP5,SP6,SP7,SP8,0) as SP4,
    COALESCE(SP5,SP6,SP7,SP8,0) as SP5,
    COALESCE(SP6,SP7,SP8,0) as SP6,
    COALESCE(SP7,SP8,0) as SP7,
    COALESCE(SP8,0) as SP8
FROM
    (<your existing query>) t

COALESCE takes a number of expressions, and returns the first non-null value.

Damien_The_Unbeliever
Actually this is somewhat better than my multi-view option but, please, still check the performance. Per-row functions never scale well.
paxdiablo
I just profiled against 100000 rows of sample data (far higher than I'd expect for a report) and it seems to add a 2% overhead, compared to a plain select query from a table. Obviously, if there's any more complication to the base query (which we already know there is), we'd expect this overhead to reduce. YMMV.
Damien_The_Unbeliever
WOW, I didn't know there's such a function. Great thanks.
phoenies
Then I'd go for this one. 2% doesn't seem too bad and you're right - if the underlying query is more complicated than the simple row extraction, the ratio should reduce even more.
paxdiablo