views:

124

answers:

4

I need to produce the column "required" in the following table using SQL without using loops and correlated sub queries. Is this possible in SQL 2008?

Date    Customer   Value   Required   Rule
20100101       1      12         12
20100101       2                  0   If no value assign 0
20100101       3      32         32
20100101       4      42         42
20100101       5      15         15
20100102       1                 12   Take last known value
20100102       2                  0   Take last known value
20100102       3      39         39
20100102       4                 42   Take last known value
20100102       5      16         16
20100103       1      13         13
20100103       2      24         24
20100103       3                 39   Take last known value
20100103       4                 42   Take last known value
20100103       5      21         21
20100104       1      14         14
20100104       2                 24   Take last known value
20100104       3                 39   Take last known value
20100104       4      65         65
20100104       5      23         23

Basically I am filling empty "Value" cells with last know value for that customer. Remember the last row may not have a valid value, so you will have to pick it from the row before that with a valid value.

+2  A: 

I need to produce the column "required" in the following table using SQL without using loops and correlated sub queries. Is this possible in SQL 2008?

Impossible. Point. Not possibly on ANY SQL based server, including oracle.

The main problem here is that you rule out loops AND correlated subqueries and any way to retrieve the value at query time will ultimately use another query to find the valid value (actually one per field). This is how SQL works. Yes, you could hide them in a custom scalar function, but still they would contain a logical sub query.

TomTom
A: 

How about a left outer join on the same table where the date is less than the current one and the value is non-empty, ordered by date desc (limit 1), returning zero when null ? (No server available to test at the moment). Unless this counts as a sub-query...

pritaeas
Can anyone come up with this query?
Faiz
Looks like Lieven just did.
pritaeas
A: 

I'm not sure if following counts considering your constraints but it gets the job done.

Test data

DECLARE @Customers TABLE (Date DATETIME, Customer INTEGER, Value INTEGER)

INSERT INTO @Customers VALUES ('20100101', 1, 12  )       
INSERT INTO @Customers VALUES ('20100101', 2, NULL)           
INSERT INTO @Customers VALUES ('20100101', 3, 32  ) 
INSERT INTO @Customers VALUES ('20100101', 4, 42  ) 
INSERT INTO @Customers VALUES ('20100101', 5, 15  ) 
INSERT INTO @Customers VALUES ('20100102', 1, NULL) 
INSERT INTO @Customers VALUES ('20100102', 2, NULL)
INSERT INTO @Customers VALUES ('20100102', 3, 39  )
INSERT INTO @Customers VALUES ('20100102', 4, NULL)
INSERT INTO @Customers VALUES ('20100102', 5, 16  )
INSERT INTO @Customers VALUES ('20100103', 1, 13  )
INSERT INTO @Customers VALUES ('20100103', 2, 24  )
INSERT INTO @Customers VALUES ('20100103', 3, NULL)
INSERT INTO @Customers VALUES ('20100103', 4, NULL)
INSERT INTO @Customers VALUES ('20100103', 5, 21  )
INSERT INTO @Customers VALUES ('20100104', 1, 14  )
INSERT INTO @Customers VALUES ('20100104', 2, NULL)
INSERT INTO @Customers VALUES ('20100104', 3, NULL)
INSERT INTO @Customers VALUES ('20100104', 4, 65  )
INSERT INTO @Customers VALUES ('20100104', 5, 23  )

Query

SELECT  c.Date
        , c.Customer
        , Value = COALESCE(c.Value, cprevious.Value, 0)
FROM    @Customers c
        INNER JOIN (
          SELECT  c.Date
                  , c.Customer
                  , MaxDate = MAX(cdates.Date)
          FROM    @Customers c
                  LEFT OUTER JOIN (
                    SELECT  Date
                            , Customer
                    FROM    @Customers
                  ) cdates ON cdates.Date < c.Date AND cdates.Customer = c.Customer
          GROUP BY
                  c.Date, c.Customer
        ) cmax ON cmax.Date = c.Date AND cmax.Customer = c.Customer                  
        LEFT OUTER JOIN @Customers cprevious ON cprevious.Date = cmax.MaxDate AND cprevious.Customer = cmax.Customer
ORDER BY
        1, 2, 3        

Update statement

UPDATE  @Customers 
SET     Value = c2.Value 
OUTPUT  Inserted.* 
FROM    @Customers c 
        INNER JOIN ( 
          SELECT  c.Date
                  , c.Customer
                  , Value = COALESCE(c.Value, cprevious.Value, 0)
          FROM    @Customers c
                  INNER JOIN (
                    SELECT  c.Date
                            , c.Customer
                            , MaxDate = MAX(cdates.Date)
                    FROM    @Customers c
                            LEFT OUTER JOIN (
                              SELECT  Date
                                      , Customer
                              FROM    @Customers
                            ) cdates ON cdates.Date < c.Date AND cdates.Customer = c.Customer
                    GROUP BY
                            c.Date, c.Customer
                  ) cmax ON cmax.Date = c.Date AND cmax.Customer = c.Customer                  
                  LEFT OUTER JOIN @Customers cprevious ON cprevious.Date = cmax.MaxDate AND cprevious.Customer = cmax.Customer
        ) c2 ON c2.Date = c.Date 
                AND c2.Customer = c.Customer 
Lieven
Thank you for all the effort. But this will handle only cases were valid value is present in previous row or the one before that, rigt? What about cases were the valid value is far from the current one?
Faiz
@Faiz, for those cases, the query becomes a bit more complex :). It requires you to join with the max(c2.Date) < c1.Date.
Lieven
I guess that will return multiple records and the Update will consider only the top most record. Now, how to get them sorted so that the record with the last valid value come at the top, is the next question :(
Faiz
@Faiz, I've updated the answer to handle gaps between dates.
Lieven
"Msg 1015, Level 15, State 1, Line 30An aggregate cannot appear in an ON clause unless it is in a subquery contained in a HAVING clause or select list, and the column being aggregated is an outer reference." :(
Faiz
@Faiz, not sure what to say besides *it works on my machine*. Have you copied the entire script and executed it or did you make any changes?
Lieven
My apologies, there was a small copy paste glitch, damn it!But the result is still lacking something, for ex: at records 14 and 18 there are mismatches.Great work though :)
Faiz
Observation: When the last row value is also NULL it is giving 0 instead of the value from the previous available row...
Faiz
+1  A: 

Faiz,

how about the following query, it does what you want as far as I understand it. The comments explain each step. Take a look at CTEs on Books Online. This example could even be changed to use the new MERGE command for SQL 2008.

/* Test Data & Table */
DECLARE @Customers TABLE
    (Dates datetime,
     Customer integer,
     Value integer) 

    INSERT  INTO @Customers
    VALUES  ('20100101', 1, 12),
        ('20100101', 2, NULL),
        ('20100101', 3, 32),
        ('20100101', 4, 42),
        ('20100101', 5, 15),
        ('20100102', 1, NULL),
        ('20100102', 2, NULL),
        ('20100102', 3, 39),
        ('20100102', 4, NULL),
        ('20100102', 5, 16),
        ('20100103', 1, 13),
        ('20100103', 2, 24),
        ('20100103', 3, NULL),
        ('20100103', 4, NULL),
        ('20100103', 5, 21),
        ('20100104', 1, 14),
        ('20100104', 2, NULL),
        ('20100104', 3, NULL),
        ('20100104', 4, 65),
        ('20100104', 5, 23) ;

/* CustCTE - This gives us a RowNum to allow us to build the recursive CTE CleanCust */
WITH    CustCTE
          AS (SELECT    Customer,
                        Value,
                        Dates,
                        ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY Dates) RowNum
              FROM      @Customers),

/* CleanCust - A recursive CTE. This runs down the list of values for each customer, checking the Value column, if it is null it gets the previous non NULL value.*/
        CleanCust
          AS (SELECT    Customer,
                        ISNULL(Value, 0) Value, /* Ensure we start with no NULL values for each customer */
                        Dates,
                        RowNum
              FROM      CustCte cur
              WHERE     RowNum = 1
              UNION ALL
              SELECT    Curr.Customer,
                        ISNULL(Curr.Value, prev.Value) Value,
                        Curr.Dates,
                        Curr.RowNum
              FROM      CustCte curr
              INNER JOIN CleanCust prev ON curr.Customer = prev.Customer
                                           AND curr.RowNum = prev.RowNum + 1)

/* Update the base table using the result set from the recursive CTE */
    UPDATE trg
    SET Value = src.Value
    FROM    @Customers trg
    INNER JOIN CleanCust src ON trg.Customer = src.Customer
                                AND trg.Dates = src.Dates

/* Display the results */
SELECT * FROM @Customers
WilliamD
Excellent William, I was almost there. I knew that CTE will be the only option (I hate CTEs though) if I have to avoid loops but I wanted to see if someone can come up with some other tricks using TSQL. Great work! I need to performance test this solution with the existing one and see how it performs... :)I think the problem I presented here is a generic one and is often seen in banking projects. Hope this will help many...
Faiz
Can I ask why you hate CTEs? They are extremely useful and can often beat many other solutions. Recursive CTEs are often problematic and costly, that being the nature of recursion though and not CTE's fault.There would be the option of using a quirky update (google for that), but it is a controversial solution.
WilliamD