tags:

views:

123

answers:

2

I have a table with stock quotes that looks something like this:

id, date, stock_id, value

Every day has several rows for each stock_id (it is automatically updated every five minutes), so the table is rather big at the moment.

How do i delete every row but the last one each day for every stock_id?

+3  A: 

I think this will do what you want:

DELETE FROM STOCK_QUOTES
  WHERE ID NOT IN (SELECT MAX(ID) AS ID
                     FROM STOCK_QUOTES
                     GROUP BY DATE, STOCK_ID));
Bob Jarvis
As with John Rasch's answer, this will discard all but the largest `stock_id` for each `date`, which is incorrect - it should discard all but the largest `id` for each `stock_id` for each `date`.
Dathan
Ah, you're right - corrected.
Bob Jarvis
+8  A: 

The other answers don't make sure to keep at least one record per stock_id per day. The following should do what you want.

DELETE FROM StockQuotes
WHERE id NOT IN (
    SELECT MAX(id)
    FROM StockQuotes
    GROUP BY stock_id, DATE(`date`)
)

Assuming id is a sequentially auto-numbered field, and date is a datetime field that at least contains the date, but my contain hour, minute, second, etc. as well.

Dathan
This should work assume it's not a DateTime field
Alan Jackson
It is indeed a datetime field. should i change to GROUP BY DATE(date) or do i need to make some other changes?
A B
@A no problem - I've made a change that should support your datetime field.
Dathan
Cant you just do MAX(date) ? that would bring up the last time of each day, i guess this would be slower than just using ID
Spidfire
@Spidfire MAX(date) will just return the most-recently-entered row, period. I suppose you could do `DELETE FROM StockQuotes WHERE date NOT IN ( SELECT MAX(date) FROM StockQuotes GROUP BY stock_id, DATE(date))`, but then you have the possibility that records for stocks A and B were entered at the same time (part of a batch process, maybe), and that's the most recent row for stock A, but another record for stock B was entered the same day. In that situation, this'll result in two rows being left for stock B for that day, which we don't want.
Dathan
Is this a query that should take a reeeeally long time to execute? I can't get it to work. No error or anything, it just keeps loading.
A B
@A It shouldn't, no. What may be happening is that it's recomputing the list of MAX(id) each deletion - which makes sense, since you're modifying the underlying table. Consider creating a temporary table `temp(id int)` and running the parenthetical query from above to populate it. Then you can just `delete from stockquotes where id not in (select id from temp)`
Dathan