views:

219

answers:

3

I'll try to explain this better on another question. This is the query that I think should work but, off course, MySQL doesn't support this specific subselect query:

select *
  from articles a
  where a.article_id in
      (select f.article_id
        from articles f
        where f.category_id = a.category_id
        order by f.is_sticky, f.published_at
        limit 3) /* limit isn't allowed inside a IN subquery */

What I'm trying to archive is this: In an articles table I have several articles for several categories. I need to obtain at most three articles per category (any number of categories).

Here's the data:

CREATE TABLE articles (
  article_id int(10) unsigned NOT NULL AUTO_INCREMENT,
  category_id int(10) unsigned NOT NULL,
  title varchar(100) NOT NULL,
  is_sticky boolean NOT NULL DEFAULT 0,
  published_at datetime NOT NULL,
  PRIMARY KEY (article_id)
);

INSERT INTO articles VALUES
(1, 1, 'foo', 0, '2009-02-06'),
(1, 1, 'bar', 0, '2009-02-07'),
(1, 1, 'baz', 0, '2009-02-08'),
(1, 1, 'qox', 1, '2009-02-09'),

(1, 2, 'foo', 0, '2009-02-06'),
(1, 2, 'bar', 0, '2009-02-07'),
(1, 2, 'baz', 0, '2009-02-08'),
(1, 2, 'qox', 1, '2009-02-09');

What I'm trying to retrieve is the following:

1, 1, qox, 1, 2009-02-09
1, 1, foo, 0, 2009-02-06
1, 1, bar, 0, 2009-02-07
1, 2, qox, 1, 2009-02-09
1, 2, foo, 0, 2009-02-06
1, 2, bar, 0, 2009-02-07

Notice how 'quox' jumped to the first place in it's category because it is sticky.

Can you figure out a way to avoid the LIMIT inside the subquery?

Thanks

A: 

I've found a (horrible, horrible) workaround that I probably shouldn't even post but...

select *
  from articles a
  where a.article_id =
      (select f.article_id
        from articles f
        where f.category_id = a.category_id
        order by f.is_sticky, f.published_at
        limit 1)
union
select *
  from articles a
  where a.article_id =
      (select f.article_id
        from articles f
        where f.category_id = a.category_id
        order by f.is_sticky, f.published_at
        limit 1, 1)
union
select *
  from articles a
  where a.article_id =
      (select f.article_id
        from articles f
        where f.category_id = a.category_id
        order by f.is_sticky, f.published_at
        limit 2, 1)
order by category_id

Since I only needed three articles per category I can repeat the query three times (instead of repeating it for each category) one for the first article in all categories, one for the second article and one for the third article in all categories and the join them all and order by the category.

It seems LIMIT isn't supported in combination with IN but works just fine to retrieve one record at a time.

If you have a better way I'm still interested in your solution.

Thanks

+1  A: 

This is a simplification of your solution

    select *
  from articles a
  where a.article_id =
      (select f.article_id
        from articles f
        where f.category_id = a.category_id
        order by f.is_sticky, f.published_at
        limit 1) or a.article_id =
      (select f.article_id
        from articles f
        where f.category_id = a.category_id
        order by f.is_sticky, f.published_at
        limit 1, 1) or 
    a.article_id =
      (select f.article_id
        from articles f
        where f.category_id = a.category_id
        order by f.is_sticky, f.published_at
        limit 2, 1)
Luis Melgratti
+1  A: 

Have a look at this code snippet called Within-group quotas (Top N per group).

Depending on the size of your set, there's two solutions proposed, one playing with a count and the other one using a temporary table for bigger tables.

So basically, if you have a big table, until MySQL implements a LIMIT in subqueries or something similar, you'll have to manually (well, or using a dynamic query in a loop) aggregate all your categories with one of the proposed solutions here.


// A solution using a temporary table and a stored procedure:

Run that once:

DELIMITER //
CREATE PROCEDURE top_articles()
BEGIN
    DECLARE done INT DEFAULT 0;
    DECLARE catid INT;
    DECLARE cur1 CURSOR FOR SELECT DISTINCT(category_id) FROM articles;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
    OPEN cur1;
    # This temporary table will hold all top N article_id for each category
    CREATE TEMPORARY TABLE top_articles (
        article_id int(10) unsigned NOT NULL
    );
    # Loop through each category
    REPEAT
        FETCH cur1 INTO catid;
        INSERT INTO top_articles
        SELECT article_id FROM articles
        WHERE category_id = catid
        ORDER BY is_sticky DESC, published_at
        LIMIT 3;
    UNTIL done END REPEAT;
    # Get all fields in correct order based on our temporary table
    SELECT * FROM articles WHERE article_id 
    IN (SELECT article_id FROM top_articles)
    ORDER BY category_id, is_sticky DESC, published_at;
    # Remove our temporary table
    DROP TEMPORARY TABLE top_articles;
END;
//
DELIMITER ;

And then, to try it out:

CALL top_articles();

You should see the results you were waiting for. And it should work for any number of articles per category, and any number of categories easily. This is what I'm getting:

+------------+-------------+-------+-----------+---------------------+
| article_id | category_id | title | is_sticky | published_at        |
+------------+-------------+-------+-----------+---------------------+
|          5 |           1 | qox   |         1 | 2009-02-09 00:00:00 | 
|          1 |           1 | foo   |         0 | 2009-02-06 00:00:00 | 
|          2 |           1 | foo   |         0 | 2009-02-06 00:00:00 | 
|          9 |           2 | qox   |         1 | 2009-02-09 00:00:00 | 
|          6 |           2 | foo   |         0 | 2009-02-06 00:00:00 | 
|          7 |           2 | bar   |         0 | 2009-02-07 00:00:00 | 
+------------+-------------+-------+-----------+---------------------+

Although I don't know how it would translate performance-wise. It probably can be optimized and cleaned out a little bit.

lpfavreau
That's very good link. Thanks. Though I can't use those solutions. Still I would up-vote your answer if I could.
(I don't have enough rep)
I though we could up-vote with 15 points and you seem to have 23. But don't worry about that, I'm just curious about how we could do it. :)
lpfavreau
Here, you might try the temporary solution adapted specifically for your articles example. Good luck!
lpfavreau