views:

263

answers:

2

I have a table, let's call it 'entries' that looks like this (simplified):

id [pk]
user_id [fk]
created [date]
processed [boolean, default false]

and I want to create an UPDATE query which will set the processed flag to true on all entries except for the latest 3 for each user (latest in terms of the created column). So, for the following entries:

1,456,2009-06-01,false
2,456,2009-05-01,false
3,456,2009-04-01,false
4,456,2009-03-01,false

Only entry 4 would have it's processed flag changed to true.

Anyone know how I can do this?

+3  A: 

I don't know postgres, but this is standard SQL and may work for you.

update entries set
  processed = true
where (
  select count(*)
  from entries as E
  where E.user_id = entries.user_id
  and E.created > entries.created
) >= 3

In other words, update the processed column to true whenever there are three or more entries for the same user_id on later dates. I'm assuming the [created] column is unique for a given user_id. If not, you'll need an additional criterion to pin down what you mean as "latest".

In SQL Server you can do this, which is a little easier to follow and will probably be more efficiently executed:

with T(id, user_id, created, processed, rk) as (
  select
    id, user_id, created, processed,
    row_number() over (
      partition by user_id
      order by created desc, id
    )
  from entries
)
  update T set
    processed = true
  where rk > 3;

Updating a CTE is a non-standard feature, and not all database systems support row_number.

Steve Kass
Yes your SQL query worked perfectly. I did try doing something just like this but it didn't work for me. I'm not sure because I didn't retain that query but I think I was trying to select count(*), user_id in the subquery for some reason, but I don't know why I would have done that.
Thanks. By the way, I changed > to >= after reading depesz's solution, which was like mine, except correct. :) Be sure you don't keep my original off-by-one error.
Steve Kass
+4  A: 

First, let's start with query that will list all rows to be updated:

select e.id
from entries as e
where (
    select count(*)
    from entries as e2
    where e2.user_id = e.user_id
        and e2.created > e.created
) > 2

This lists all ids of records, that have more than 2 such records that user_id is the same, but created is later than created in row to be returned.

That is it will list all records but last 3 per user.

Now, we can:

update entries as e
set processed = true
where (
    select count(*)
    from entries as e2
    where e2.user_id = e.user_id
        and e2.created > e.created
) > 2;

One thing thought - it can be slow. In this case you might be better off with custom aggregate, or (if you're on 8.4) window functions.

depesz