tags:

views:

39

answers:

2

I have views which look something like this:

mysql> select * from p2;
+---+---------+---------+
| k | measure | time_id |
+---+---------+---------+
| D |     200 |       2 |
| E |     201 |       2 |
| F |     203 |       2 |
| A |      20 |       1 |
| B |      22 |       1 |
| C |      23 |       1 |
| D |     100 |       1 |
| E |     101 |       1 |
| F |     103 |       1 |
| G |       4 |       1 |
| H |       7 |       1 |
| I |      10 |       1 |
+---+---------+---------+

(k, time_id) is a unique key, and the above is greatly simplified (there will be many more values of time_id and k). The sort order is time_id DESC (followed by k ASC, but that's not so important).

I want to find a SELECT statement that will filter it to this:

+---+---------+---------+
| k | measure | time_id |
+---+---------+---------+
| D |     200 |       2 |
| E |     201 |       2 |
| F |     203 |       2 |
| A |      20 |       1 |
| B |      22 |       1 |
| C |      23 |       1 |
| G |       4 |       1 |
| H |       7 |       1 |
| I |      10 |       1 |
+---+---------+---------+

I want to make sure that values for column k are unique, by filtering out rows where the k value has already been used before.

In this example, in the original view rows 0, 1, 2 contained k values D, E and F, but so did rows 6, 7, 8, so rows 6-8 are removed to make the second view.

Is there a SELECT statement that can do this? It feels like it should be straightforward, but I can't figure out how to do it.

+2  A: 
  select * from p2 e
      join (select k, Max(time_id) time_id 
            from p2
            group by k) t
      ON (e.k = t.k and e.time_id = t.time_id)
Michael Pakhantsov
+1  A: 

You may want to use a derived table:

SELECT p2.* 
FROM   p2
JOIN   (
           SELECT    MAX(time_id) max_time, k 
           FROM      p2 
           GROUP BY  k
       ) d_p2 ON (d_p2.k = p2.k AND d_p2.max_time = p2.time_id);

Or you could also use the "null-self-join" method:

SELECT    p2.*
FROM      p2
LEFT JOIN p2 AS d_p2 ON d_p2.k = p2.k AND d_p2.time_id > p2.time_id
WHERE     d_p2.k IS NULL;

These should work fine as long as you are sure that time_id is unique for each k. Otherwise you could still get duplicate rows.

Test case:

CREATE TABLE p2 (k char(1), measure int, time_id int);

INSERT INTO p2 VALUES ('D', 200, 2);
INSERT INTO p2 VALUES ('E', 201, 2);
INSERT INTO p2 VALUES ('F', 203, 2);
INSERT INTO p2 VALUES ('A',  20, 1);
INSERT INTO p2 VALUES ('B',  22, 1);
INSERT INTO p2 VALUES ('C',  23, 1);
INSERT INTO p2 VALUES ('D', 100, 1);
INSERT INTO p2 VALUES ('E', 101, 1);
INSERT INTO p2 VALUES ('F', 103, 1);
INSERT INTO p2 VALUES ('G',   4, 1);
INSERT INTO p2 VALUES ('H',   7, 1);
INSERT INTO p2 VALUES ('I',  10, 1);

Result:

+------+---------+---------+
| k    | measure | time_id |
+------+---------+---------+
| D    |     200 |       2 |
| E    |     201 |       2 |
| F    |     203 |       2 |
| A    |      20 |       1 |
| B    |      22 |       1 |
| C    |      23 |       1 |
| G    |       4 |       1 |
| H    |       7 |       1 |
| I    |      10 |       1 |
+------+---------+---------+
9 rows in set (0.00 sec)
Daniel Vassallo
@yassam: If you will be having many rows in your table (many thousands), make sure to test the performance of both methods, because one can be much faster than the other, depending on some circumstances: http://kristiannielsen.livejournal.com/6745.html
Daniel Vassallo