tags:

views:

80

answers:

4

I have a database table with one column being dates. However, some of the rows should share the same date but due to lag on insertion there's a one second difference between them. The insert part has been fixed already but the current data in the table needs to be fixed as well.

As an example the following data is present:

2008-10-08 12:23:01   1   1   x
2008-10-08 12:23:01   1   2   y
2008-10-08 12:23:02   1   3   z

Now I want to update the last row in this example and set the date to '2008-10-08 12:23:01'.

Any suggestions?

A: 

For all rows::.

update yourtable set  date_added=date_added-'01';

for a specific row add a where clause

halocursed
Well this update statement isn't the hard part, the hard part is in fact the where clause to fetch the specific rows.
pbean
do you only want to update the last row or all the rows containing some specific value or before the time you fixed the insert delay
halocursed
A: 

The best way I can think of is writing an external script to do that. It's tricky to determine which columns are correct and which should be updated without having more control over the grouping. Pseudo-code:

all_rows = SELECT * FROM table ORDER BY date
last_date = NULL
rows_to_update = []
for row in all_rows:
    if last_date is NULL or row.date - last_date > X seconds:
        set date to last_date for all rows from rows_to_update
        last_date = row.date
        rows_to_update = []
    else if row.date != last_date:
        rows_to_update += row

Alternatively, something like this could work, but you might need more than one run if want to handle cases where all three dates are different and you want to normalize two of them to the first one.

UPDATE
   tbl t,
   (SELECT
        t.date,
        (SELECT min(date)
         FROM tbl
         WHERE timestampdiff(SECOND,date,t.date) BETWEEN 1 AND 3) AS new_date
    FROM tbl t) t2
SET t.date=t2.new_date
WHERE t.date=t2.date AND t2.new_date IS NOT NULL
Lukáš Lalinský
I ran the SELECT part of your proposed UPDATE query against the table and it seems like it lists all the right (new) dates for all values, so it's exactly what I wanted. I already tried a similar query but it returned a couple of thousand rows too many, while this one returns exactly the right amount.
pbean
A: 

due to lag in insertion

Why don't you get the date for insert before inserting/updating the first row and use that for all the other rows?

Johannes Rudolph
I do that now (that was the fix) but there are still old rows in the database with the dates mixed up. In fact, a full fix would introduce a new database layout but at this moment we can't do that.
pbean
A: 

Hi, try this:

Assuming you have this structure:

create table tbl(id int identity, dt datetime)
insert into tbl (dt) values('2009-10-08 12:23:01')
insert into tbl (dt) values('2009-10-08 12:23:01')
insert into tbl (dt) values('2009-10-08 12:23:02')
insert into tbl (dt) values('2009-10-08 12:23:05')
insert into tbl (dt) values('2009-10-08 12:23:05')
insert into tbl (dt) values('2009-10-08 12:23:06')

This query will only show the last item of each set that's 1 second late:

select distinct A.* from tbl A
join (select * from tbl) AS T on datediff(ss, T.dt, A.dt) = 1

Using that in conjunction with an UPDATE statement, you get this:

update tbl set dt = (select top 1 dt from tbl where tbl.id < A.id order by tbl.id desc)
from tbl A
join (select * from tbl) AS T on datediff(ss, T.dt, A.dt) = 1

And that updates the last record of each set to the date above it, giving the results:

1           2009-10-08 12:23:01.000
2           2009-10-08 12:23:01.000
3           2009-10-08 12:23:01.000
4           2009-10-08 12:23:05.000
5           2009-10-08 12:23:05.000
6           2009-10-08 12:23:05.000

Its quick and dirty and unoptimized, but for a once-off data-scrub it should work.

Remember to back up!

Wez