views:

156

answers:

2

A tough SQL question (I'm using postgres by the way).

I need the first row inserted every day for the past X days. one of my columns is a timestamp, which i hold the time inserted, and another column is the row id.

If it's not possible to get the first row inserted every day, i at least need a unique one; a single row for every day for the past x days.

Any suggestions?

Thanks

okie

+1  A: 

You may want to try something like the following (tested in MySQL, but I guess it should be easy to port to Postgres):

SELECT      l.id, l.timestamp, l.value
FROM        log l
INNER JOIN  (
             SELECT    MIN(timestamp) first_timestamp
             FROM      log
             GROUP BY  DATE(timestamp)
            ) sub_l ON (sub_l.first_timestamp = l.timestamp)
WHERE       l.timestamp > DATE_ADD(NOW(), INTERVAL -30 DAY);

Note that this assumes that your timestamps are unique.

Test Case (in MySQL):

CREATE TABLE log (id int, timestamp datetime, value int);

INSERT INTO log VALUES (1, '2010-06-01 02:00:00', 100);
INSERT INTO log VALUES (2, '2010-06-01 03:00:00', 200);
INSERT INTO log VALUES (3, '2010-06-01 04:00:00', 300);
INSERT INTO log VALUES (4, '2010-06-02 02:00:00', 400);
INSERT INTO log VALUES (5, '2010-06-02 03:00:00', 500);
INSERT INTO log VALUES (6, '2010-06-03 02:00:00', 600);
INSERT INTO log VALUES (7, '2010-06-04 02:00:00', 700);
INSERT INTO log VALUES (8, '2010-06-04 03:00:00', 800);
INSERT INTO log VALUES (9, '2010-06-05 05:00:00', 900);
INSERT INTO log VALUES (10, '2010-06-05 03:00:00', 1000);

Result:

+------+---------------------+-------+
| id   | timestamp           | value |
+------+---------------------+-------+
|    1 | 2010-06-01 02:00:00 |   100 |
|    4 | 2010-06-02 02:00:00 |   400 |
|    6 | 2010-06-03 02:00:00 |   600 |
|    7 | 2010-06-04 02:00:00 |   700 |
|   10 | 2010-06-05 03:00:00 |  1000 |
+------+---------------------+-------+
5 rows in set (0.00 sec)
Daniel Vassallo
if you added a min(ID) to the derived table, and a ID clause to the join condition, you can work around the "assumes that your timestamps are unique" too.
potatopeelings
@potatopeelings: I don't think it's that easy. `SELECT MIN(timestamp), MIN(id) FROM log GROUP BY DATE(timestamp)` on the above test case would return the last row as `2010-06-05 03:00:00 | 9`. If I were to add an ID clause to the JOIN condition, it would not match, because there is no row in the table with timestamp = `'2010-06-05 03:00:00' AND id = '9'`... (This is at least in MySQL).
Daniel Vassallo
oh yes, you're right. My bad. It'd have to be one more subquery or join to get the row with the lowest ID out of the ones with the lowest timestamps per day. Like you pointed out - not as easy as the MIN, MIN. Sorry!
potatopeelings
@potatopeelings: No probs :) ... Yes I agree, that can be done with one additional subquery. If the OP requires that, I'll update my answer.
Daniel Vassallo
+1  A: 

mr vassallo, you're a rock star.

it worked great. here is the postgres version of your SQL:

SELECT l.id, l.timestamp, l.value
FROM log l
INNER JOIN  (
             SELECT    MIN(timestamp) AS first_timestamp
             FROM      log
             GROUP BY  DATE(timestamp)
) sub_l ON (sub_l.first_timestamp = l.timestamp)
WHERE       l.timestamp > NOW() - INTERVAL '30 DAY' ORDER BY l.timestamp;

there is no need to get the minimal ID because i cannot be guaranteed that the inserts will be in direct chronological order (the timestamp is not really the inserted time, but a timestamp residing within the data, and data packets can come out of order).

i really appreciate the help. thank you for taking a look at this.

okie.floyd
sorry, that should say 'SELECT MIN(timestamp) AS first_timestamp'
okie.floyd
@okie: I'm glad this helped. And thanks for posting the Postgres version :) I've edited your answer to fix the `MIN(timestamp) AS ...` part as you suggested.
Daniel Vassallo