tags:

views:

50

answers:

2

Hi,

I have the following query:

SELECT title, karma, DATE(date_uploaded) as d
FROM image
ORDER BY d DESC, karma DESC

This will give me a list of image records, first sorted by newest day, and then by most karma.

There is just one thing missing: I want to only get the x images with the highest karma per day. So for example, per day I only want the 10 most karma images. I could of course run multiple queries, one per day, and then combine the results.

I was wondering if there is a smarter way that still performs well. I guess what I am looking for is a way to use LIMIT x,y per group of results?

+2  A: 

You can do it by emulating ROW_NUMBER using variables.

SELECT d, title, karma
FROM (
    SELECT
        title,
        karma,
        DATE(date_uploaded) AS d,
        @rn := CASE WHEN @prev = UNIX_TIMESTAMP(DATE(date_uploaded))
                    THEN @rn + 1
                    ELSE 1
               END AS rn,
        @prev := UNIX_TIMESTAMP(DATE(date_uploaded))
    FROM image, (SELECT @prev := 0, @rn := 0) AS vars
    ORDER BY date_uploaded, karma DESC
) T1
WHERE rn <= 3
ORDER BY d, karma DESC

Result:

'2010-04-26', 'Title9', 9
'2010-04-27', 'Title5', 8
'2010-04-27', 'Title6', 7
'2010-04-27', 'Title7', 6
'2010-04-28', 'Title4', 4
'2010-04-28', 'Title3', 3
'2010-04-28', 'Title2', 2

Quassnoi has a good article about this which explains the technique in more details: Emulating ROW_NUMBER() in MySQL - Row sampling.

Test data:

CREATE TABLE image (title NVARCHAR(100) NOT NULL, karma INT NOT NULL, date_uploaded DATE NOT NULL);
INSERT INTO image (title, karma, date_uploaded) VALUES
('Title1', 1, '2010-04-28'),
('Title2', 2, '2010-04-28'),
('Title3', 3, '2010-04-28'),
('Title4', 4, '2010-04-28'),
('Title5', 8, '2010-04-27'),
('Title6', 7, '2010-04-27'),
('Title7', 6, '2010-04-27'),
('Title8', 5, '2010-04-27'),
('Title9', 9, '2010-04-26');
Mark Byers
Thank you so much for the extensive effort. I have some concerns about the performance but will test this, I may need to cache the output anyway :)
Ferdy
@Ferdy: To improve the performance of this query add a combined index on `(date_uploaded, karma)`.
Mark Byers
A: 

Maybe this will work:

SELECT title, karma, DATE(date_uploaded) as d FROM image img WHERE id IN ( SELECT id FROM image WHERE DATE(date_uploaded)=DATE(img.date_uploaded) ORDER BY karma DESC LIMIT 10 ) ORDER BY d DESC, karma DESC

But this is not very efficient, as you don't have an index on DATE(date_uploaded) (I don't know if that would be possible, but I guess it isn't). As the table grows this can get very CPU expensive. It might be simpler to just have a loop in your code :-).

lfagundes