views:

209

answers:

3

I am trying to write T-sql script which will find "open" records for one table

Structure of data is following

Id (int PK)      Ts (datetime)       Art_id (int)     Amount (float)
1                '2009-01-01'        1                1
2                '2009-01-05'        1                -1
3                '2009-01-10'        1                1
4                '2009-01-11'        1                -1
5                '2009-01-13'        1                1
6                '2009-01-14'        1                1
7                '2009-01-15'        2                1
8                '2009-01-17'        2                -1
9                '2009-01-18'        2                1

According to my needs I am trying to show only records after last sum for every one articles where 0 sorting by date of last running sum of zero value. So I am trying to abstract (show) records 5 and 6 for Art_id=1 and record 9 for art_id=2. I am using MSSQL2005 and my table has around 30K records with 6000 distinct values of ART_ID.

+1  A: 

Complete working query:

SELECT 
  * 
FROM TABLE_NAME E
JOIN
  (SELECT
    C.ART_ID,
    MAX(TS) MAX_TS
  FROM
    (SELECT 
      ART_ID,
      TS,
      COALESCE((SELECT SUM(AMOUNT) FROM TABLE_NAME B WHERE (B.Art_id = A.Art_id) AND (B.Ts < A.Ts)),0) ROW_SUM
    FROM TABLE_NAME A) C
  WHERE C.ROW_SUM = 0
  GROUP BY C.ART_ID) D
ON 
  (D.ART_ID = E.ART_ID) AND
  (E.TS >= D.MAX_TS)

First we calculate running sums for every row:

SELECT 
  ART_ID,
  TS,
  COALESCE((SELECT SUM(AMOUNT) FROM TABLE_NAME B WHERE (B.Art_id = A.Art_id) AND (B.Ts < A.Ts)),0) ROW_SUM
FROM TABLE_NAME A

Then we look for last article with 0:

SELECT
  C.ART_ID,
  MAX(TS) MAX_TS
FROM
  (SELECT 
    ART_ID,
    TS,
    COALESCE((SELECT SUM(AMOUNT) FROM TABLE_NAME B WHERE (B.Art_id = A.Art_id) AND (B.Ts < A.Ts)),0) ROW_SUM
  FROM TABLE_NAME A) C
WHERE C.ROW_SUM = 0
GROUP BY C.ART_ID
LukLed
This works! Differences with my answer: you don't show articles with just 1 row; I assume the id's are increasing with time; you assume no two rows have the exact same TS date.
Andomar
'you don't show articles with just 1 row' - I don't understand.'you assume no two rows have the exact same TS date' - Yes, because you didn't write how should mechanism behave in this situation.
LukLed
If You assume that ID's are increasing with time, You can change (B.Ts <= A.Ts) to (B.Id <= A.Id). This solves the problem with the same date. Lower id is first.
LukLed
Thank you on your time and help at my time zone now is one past midnight and temporally I am not able to test queries, You make may day tomorrow it is Sunday and wife is again going to crazy becuse I work on weekends. Regarding to same date I also have real time stamp of record with milliseconds and Ill try implement and that, which is maybe better solution than hunting higher ID, Oce again thanks on your time
adopilot
@LukLeg: Say article 4 had only 1 row (10,'2009-01-18',4,1), that row would not show up in your query
Andomar
Now it is ok. I correted it.
LukLed
+1  A: 

You can find all rows where the running sum is zero with:

select cur.id, cur.art_id
from @articles cur
left join @articles prev
 on prev.art_id = cur.art_id
 and prev.id <= cur.id
group by cur.id, cur.art_id
having sum(prev.amount) = 0

Then you can query all rows that come after the rows with a zero running sum:

select a.*
from @articles a
left join (
    select cur.id, cur.art_id, running = sum(prev.amount)
    from @articles cur
    left join @articles prev
     on prev.art_id = cur.art_id
     and prev.ts <= cur.ts
    group by cur.id, cur.art_id
    having sum(prev.amount) = 0
) later_zero_running on
    a.art_id = later_zero_running.art_id
    and a.id <= later_zero_running.id
where later_zero_running.id is null

The LEFT JOIN in combination with the WHERE says: there can not be a row after this row, where the running sum is zero.

Andomar
+2  A: 

In this solution I simply want to find all the rows where there isn't a subsequent row for that Art_id where the running sum was 0. I am assuming we can use the ID as a better tiebreaker than TS, since two rows can come in with the same timestamp but they will get sequential identity values.

;WITH base AS
(
    SELECT
     ID, Art_id, TS, Amount,
     RunningSum = Amount + COALESCE
     (
      (
               SELECT SUM(Amount)
       FROM dbo.foo
       WHERE Art_id = f.Art_id 
       AND ID < f.ID
      )
      , 0
     )
    FROM dbo.[table name] AS f
)
SELECT ID, Art_id, TS, Amount
FROM base AS b1
WHERE NOT EXISTS
(
    SELECT 1 
    FROM base AS b2 
    WHERE Art_id = b1.Art_id
    AND ID >= b1.ID
    AND RunningSum = 0
)
ORDER BY ID;
Aaron Bertrand