views:

194

answers:

3

I have the following mysql table called "pics", with the following fields and sample data:

id   vehicle_id    filename    priority
1    45            a.jpg       4
2    45            b.jpg       1
3    56            f.jpg       4
4    67            cc.jpg      4
5    45            kt.jpg      3
6    67            gg.jpg      1

Is it possible, in a single query, to get one row for each vehicle_id, and the row be the highest priority?

The result I'm looking for:

array (
  [0] => array( [id] => '2', [vehicle_id] => '45', [filename] => 'b.jpg',  [priority] => '1' ),
  [1] => array( [id] => '3', [vehicle_id] => '56', [filename] => 'f.jpg',  [priority] => '4' ),
  [2] => array( [id] => '6', [vehicle_id] => '67', [filename] => 'gg.jpg', [priority] => '1' )
);

If not possible in a single query, what would be the best approach?

Thanks!

+4  A: 

This seems to be a typical max per group query. In most databases you could easily do this using ROW_NUMBER:

SELECT id, vehicle_id, filename, priority
FROM (
    SELECT 
        id, vehicle_id, filename, priority,
        ROW_NUMBER() OVER (PARTITION BY vehicle_id
                           ORDER BY priority DESC, id) AS rn
    FROM pics
) AS T1
WHERE rn = 1

Since MySQL doesn't yet support ROW_NUMBER you can emulate it using variables:

SELECT id, vehicle_id, filename, priority
FROM (
    SELECT
        id, vehicle_id, filename, priority,
        @rn := CASE WHEN @prev_vehicle_id = vehicle_id
                    THEN @rn + 1
                    ELSE 1
               END AS rn,
        @prev_vehicle_id := vehicle_id
    FROM (SELECT @prev_vehicle_id := NULL) vars, pics T1
    ORDER BY vehicle_id, priority DESC, id
) T2
WHERE rn = 1
Mark Byers
Truth be told, I almost never get votes for these :/
OMG Ponies
For some reason, people find variables scary in MySQL indeed, I guess it's the unfamiliarity with the '@'s and ':=' compared to ANSI SQL which scare them off. It shouldn't of course, variables can be a major boost to performance.
Wrikken
+3  A: 

Note that this method was posted before it was made clear that it is required to handle priority ties. I'm leaving it here for reference (see comments below). Check out @Mark's answer for a solution that handles ties as required:

SELECT p.id, p.vehicle_id, p.filename, p.priority
FROM   pics p
JOIN   (
           SELECT   vehicle_id, MAX(priority) max_priority
           FROM     pics
           GROUP BY vehicle_id
       ) sub_p ON (sub_p.vehicle_id = p.vehicle_id AND 
                   sub_p.max_priority = p.priority)
GROUP BY p.vehicle_id;

This assumes that there can be no priority ties for the same vehicle_id.

Test case:

CREATE TABLE pics (id int, vehicle_id int, filename varchar(10), priority int);

INSERT INTO pics VALUES ('1', '45', 'a.jpg', '4');
INSERT INTO pics VALUES ('2', '45', 'b.jpg', '1');
INSERT INTO pics VALUES ('3', '56', 'f.jpg', '4');
INSERT INTO pics VALUES ('4', '67', 'cc.jpg', '4');
INSERT INTO pics VALUES ('5', '45', 'kt.jpg', '3');
INSERT INTO pics VALUES ('6', '67', 'gg.jpg', '1');

Result:

+------+------------+----------+----------+
| id   | vehicle_id | filename | priority |
+------+------------+----------+----------+
|    1 |         45 | a.jpg    |        4 |
|    3 |         56 | f.jpg    |        4 |
|    4 |         67 | cc.jpg   |        4 |
+------+------------+----------+----------+
3 rows in set (0.01 sec)
Daniel Vassallo
This only works if they need a single record, min/max at that. This won't work if they want top 5/10/etc per.
OMG Ponies
@OMG: That's true. But I think that's the only disadvantage, isn't it? ... The OP doesn't seem to require that: "the row be the highest priority"
Daniel Vassallo
He did say he explicitly that he only wanted one row so I don't think that's a serious problem. A more serious problem is that this can return more than one row, though it is fixable.
Mark Byers
Never know what someone's requirements are, never too late for a bait'n'switch.
OMG Ponies
PS: Check the question again - he just posted his requirements for a tie break - "first posted". Looks like you will need that fix after all...
Mark Byers
@Mark: Yes, I updated my answer so that it is clear that this assumes that there can be no priority ties... I tend to avoid using variables in MySQL unless strictly necessary. However I'm not sure if this "tendency" is justified. If the case won't have priority ties, would you still go for the variables approach? (just curious)
Daniel Vassallo
@Mark: Oops... :)
Daniel Vassallo
@Daniel Vassallo: This approach should give higher performance than the variable approach, so that's one reason to go with this over the method I suggested. But this method is harder to get right. The number of columns involved in the tiebreaking changes the structure of the query. To modify my answer to include the extra column was trivial. So I think both answers have pros and cons. Shame that there is no obvious best answer to this straightforward question.
Mark Byers
@Mark: That's perfectly true. It's probably more complicated than it's worth to handle tie rules correctly.
Daniel Vassallo
+3  A: 

While this may be the 'accepted' answer, the performance of Mark's solution is under normal circumstances many times better, and equally valid for the question, so by all means, go for his solution in production!


SELECT a.id, a.vehicle_id, a.filename, a.priority
FROM pics a
LEFT JOIN pics b               -- JOIN for priority
ON b.vehicle_id = a.vehicle_id 
AND b.priority > a.priority
LEFT JOIN pics c               -- JOIN for priority ties
ON c.vehicle_id = a.vehicle_id 
AND c.priority = a.priority 
AND c.id < a.id
WHERE b.id IS NULL AND c.id IS NULL

Assuming 'id' is a non-nullable column.

[edit]: my bad, need second join, cannot do it with just one.

Wrikken
Perfect! A single query. :)
briang
Won't this be far too slow? Have you tested the performance of it? In what way are the other answers not 'a single query'?
Mark Byers
@Mark: I am a bit unfamiliar with actual performance testing. I only assumed that a single query would be faster than multiple. I would like to test these out. Is there a particular resource I could test these options with? Thanks for the help!
briang
The 'not a single query' is indeed not very to the point imo, as all answers here are. This won't be slow with a proper `INDEX(vehicle_id,priority)`, MySQL recognizes the 'LEFT JOIN with no results'-construct with quite an optimised query plain in my experience. Difference may vary depending on the amount of records per vehicle ID. If I had to guess (and the table was properly indexed), I'd say Mark Byers' answer probably performs better with a low amount of records per vehicle_id, while this one can take more advantage of the index on high amounts of records per vehicle_id.
Wrikken
That is: compared to themselves. Comparing the performance of both those answers in the real life situation is what should be done.
Wrikken
@Mark: Sorry, missed your trailing question "In what way are the other answers not 'a single query'?". Isn't a "sub-query" an additional query?
briang
@Wrikken: Even with the index you suggested for 10000 records in pics (100 vehicle_ids, 7 priority levels) I time this at 1.3 seconds. My solution takes 0.09s for the same data. I can post scripts to generate the test data I used if you wish. I wish something this simple was fast... maybe I'm missing something, but I'm very skeptical that this method will be as fast as the other two solutions.
Mark Byers
@Mark: will run some tests tomorrow, I'm interested too, and how it may or may not alter with the amount of records per vehicle id, and w. If you already have the script in place to generate the testdata I'd be grateful if you could pastebin it somewhere, seems a waste to write my own :) Of course, I will destroy the (small) advantage of probably having the records already in order in the table just to make a point, but I don't imagine that will influence the result considerably.
Wrikken
@Wrikken: http://pastebin.org/352877
Mark Byers
@Mark: sorry for the delay, have been busy: a.t.m, with setting the most biased environment I could think of for my join solution, it is still about 3 times as slow as yours (version 5.1.47). Will keep your solution in mind.
Wrikken