views:

380

answers:

4

Assuming I have a table foo where I have something like this:

id, user_id, timestamp, some_value

What I want to do is remove all rows that aren't the newest N per user.

The deletion itself could be handled by a:

DELETE FROM foo WHERE id NOT IN (...)

so you could rephrase the problem into: How do I get the newest N(there might be less) rows for each user. This means if I have U users I may end up with N*U rows so LIMIT wont really work.

A: 

First, get the total number of rows using this:

SELECT COUNT(*) as total FROM foo WHERE id NOT IN (...)

Then try this:

DELETE FROM foo WHERE id NOT IN (...) ORDER BY timestamp ASC LIMIT (Count - N)

replacing N by your number. This will delete all except the newest N rows. For example, if there are a total of 100 rows and you want the newest 5 to be saved, this will delete (100-5) the oldest 95 rows.

Click Upvote
i'm sorry for being unclear, i meant N rows for each user_id
tliff
In that case you would change the query to: Where id=user_id. I assume you have a list of all the ids in an array, you'd loop over each id and have this query executed for them
Click Upvote
A: 

DELETE FROM foo WHERE id NOT IN ( SELECT id FROM foo ORDER BY timestamp DESC LIMIT N )

Edit:

I misunderstood the question. you want to keep N records for each user. Maybe this:

SELECT user_id FROM foo

Then for each user_id (as currentID):

DELETE FROM foo WHERE user_id=currentID AND id NOT IN ( SELECT id FROM foo WHERE user_id=currentID ORDER BY timestamp DESC LIMIT N )

(i'm not very sure about the syntax, but i hope the idea is clear)

Aziz
i'm sorry for being unclear, i meant N rows for each user_id
tliff
+2  A: 

MySQL does not support reading from a table with SELECT and performing an UPDATE/INSERT/DELETE on the same table in the same query. So doing what you want in one statement is going to be tricky.

I would do it in two stages: first, query the newest $N records per user, and store them in a temporary table:

CREATE TEMPORARY TABLE foo_top_n
  SELECT f1.id
  FROM foo f1 LEFT OUTER JOIN foo f2
    ON (f1.user_id = f2.user_id AND f1.id < f2.id)
  GROUP BY f1.id
  HAVING COUNT(*) < $N;

Next, use the multi-table DELETE syntax and join foo to the temporary table, deleting where no match is found:

DELETE f1 FROM foo f1 LEFT OUTER JOIN foo_top_n f2 USING (id)
WHERE f2.id IS NULL;
Bill Karwin
+2  A: 

Actually, it is possible to do it a single query:

DELETE  l.*
FROM    foo l
JOIN    (
        SELECT  user_id,
                COALESCE(
                (
                SELECT  timestamp
                FROM    foo li
                WHERE   li.user_id = dlo.user_id
                ORDER BY
                        li.user_id DESC, li.timestamp DESC
                LIMIT 2, 1
                ), CAST('0001-01-01' AS DATETIME)) AS mts,
                COALESCE(
                (
                SELECT  id
                FROM    foo li
                WHERE   li.user_id = dlo.user_id
                ORDER BY
                        li.user_id DESC, li.timestamp DESC, li.id DESC
                LIMIT 2, 1
                ), -1) AS mid
        FROM    (
                SELECT  DISTINCT user_id
                FROM    foo dl
                ) dlo
        ) lo
ON      l.user_id = lo.user_id
        AND (l.timestamp, l.id) < (mts, mid)

See detailed explanations here:

Quassnoi
+1 awesome query
Bigbohne