views:

1106

answers:

5

I have a database issue that i currently cannot wrap my head around with an easy solution. In my db I have a table that stores event values.. 0's and 1's with a timestamp. Issue being that it is possible for there to be the same event to occur twice as a business rule. Like below

  • '2008-09-22 16:28:14.133', 0
  • '2008-09-22 16:28:35.233', 1
  • '2008-09-22 16:29:16.353', 1
  • '2008-09-22 16:31:37.273', 0
  • '2008-09-22 16:35:43.134', 0
  • '2008-09-22 16:36:39.633', 1
  • '2008-09-22 16:41:40.733', 0

in real life these events are cycled and I’m trying to query over to get the cycles of these but I need to ignore the duplicate values ( 1,1 ) the current solution is using a SQL cursor to loop each and throw out the value if the previous was the same. I’ve considered using a trigger on the insert to clean up in a post processed table but I can’t think of an easy solution to do this set based.

Any ideas or suggestions?

Thanks

+1  A: 

(preface.......i've only done this in oracle, but I'm pretty sure if the db supports triggers it's all possible)

Have a before insert trigger that selects the row with the max timestamp value. If that row's value is the same as the one you wish to insert, ignore it.

This should keep them all in a correct state.

Now, if you need both sets of states stored, the trigger can always insert on the all-inclusive table, but do the lookup and insert on the 'filtered' table only when the value changes.

tyshock
A: 

Just so that I understand the problem.

You have, if you order the row set based on the timestamp, sometimes duplicate values occuring next to each other, like the above pair of 1's in the 2nd and 3rt item? and then you have double 0's in the 4th and 5th, is that it?

And you want the last of the corresponding pair (or sequence if there are more than 2)?

Why do you need to remove them? I'm asking because unless they occupy a significant share of the size of this table, it might be easier to filter them out like you do sequentially when you need to process or display them.

A solution, though not a very good one, would be to retrieve the minimum timestamp above the timestamp of the current row you're examining, and then retrieve the value from that, and if it's the same, don't return the current row.

Here's the SQL to get everything:

SELECT timestamp, value
FROM yourtable

And here's how to join in to get the minimum timestamp above the current one:

SELECT T1.timestamp, MIN(T2.timestamp) AS next_timestamp, T1.value
FROM yourtable T1, yourtable T2
WHERE T2.timestamp > T1.timestamp
GROUP BY T1.timestamp, T1.value

(I fear the above query will be horribly slow)

And then to retrieve the value corresponding to that minimum timestamp

SELECT T3.timestamp, T3.value
FROM (
    SELECT T1.timestamp, MIN(T2.timestamp) AS next_timestamp, T1.value
    FROM yourtable T1, yourtable T2
    WHERE T2.timestamp > T1.timestamp
    GROUP BY T1.timestamp, T1.value
) T3, yourtable AS T4
WHERE T3.next_timestamp = T4.timestamp
  AND T3.value <> T4.value

Unfortunately this doesn't produce the last value, as it needs a following value to compare against. A simple dummy sentinel-value (you can union that in if you need to) will handle that.

Here's the sqlite database dump I tested the above query against:

BEGIN TRANSACTION;
CREATE TABLE yourtable (timestamp datetime, value int);
INSERT INTO "yourtable" VALUES('2008-09-22 16:28:14.133',0);
INSERT INTO "yourtable" VALUES('2008-09-22 16:28:35.233',1);
INSERT INTO "yourtable" VALUES('2008-09-22 16:29:16.353',1);
INSERT INTO "yourtable" VALUES('2008-09-22 16:31:37.273',0);
INSERT INTO "yourtable" VALUES('2008-09-22 16:35:43.134',0);
INSERT INTO "yourtable" VALUES('2008-09-22 16:36:39.633',1);
INSERT INTO "yourtable" VALUES('2008-09-22 16:41:40.733',0);
INSERT INTO "yourtable" VALUES('2099-12-31 23:59:59.999',2);
COMMIT;

And here is the (formatted) output:

timestamp                 value
2008-09-22 16:28:14.133   0
2008-09-22 16:29:16.353   1
2008-09-22 16:35:43.134   0
2008-09-22 16:36:39.633   1
2008-09-22 16:41:40.733   0
Lasse V. Karlsen
A: 

This problem is really a data capture problem. A typical database engine is not a good choice to solve it. A simple preprocessor should detect the change in the input data set and store only the relevant data (time stamp, etc.).

An easy solution is in a database environment (for example in Oracle) to create a package which can have local memory variables for storing last input data set and eliminate unneeded database access.

Of course you can use all the power of the database environment to define the "change in input data set" and store the filtered data. So it could be easy or complex as you whish.

A: 

This uses a SQL Server Common Table Expression, but it can be inlined, with table t with columns dt and cyclestate:

;WITH Firsts AS (
    SELECT t1.dt
        ,MIN(t2.dt) AS Prevdt
    FROM t AS t1
    INNER JOIN t AS t2
        ON t1.dt < t2.dt
        AND t2.cyclestate <> t1.cyclestate
    GROUP BY t1.dt
)
SELECT MIN(t1.dt) AS dt_start
    ,t2.dt AS dt_end
FROM t AS t1
INNER JOIN Firsts
    ON t1.dt = Firsts.dt
INNER JOIN t AS t2
    ON t2.dt = Firsts.Prevdt
    AND t1.cyclestate <> t2.cyclestate
GROUP BY t2.dt
    ,t2.cyclestate
HAVING MIN(t1.cyclestate) = 0
Cade Roux
A: 

Thanks for the answer's guys. Im going to look at each and see what works best for my situation. Thanks for the suggestions!

Bobby