tags:

views:

162

answers:

3

Hello.

I have the following table on a MySQL 5.1.30:

CREATE TABLE  article (
  article_id int(10) unsigned NOT NULL AUTO_INCREMENT,
  category_id int(10) unsigned NOT NULL,
  title varchar(100) NOT NULL,
  PRIMARY KEY (article_id)
);

With this information:

1, 1, 'foo'
2, 1, 'bar'
3, 1, 'baz'
4, 1, 'quox'
5, 2, 'quonom'
6, 2, 'qox'

I need to obtain the first three articles in each category for all categories that have articles. Something like this:

1, 1, 'foo'
2, 1, 'bar'
3, 1, 'baz'
5, 2, 'quonom'
6, 2, 'qox'

Of course a union would work:

select * from articles where category_id = 1 limit 3
union
select * from articles where category_id = 2 limit 3

But there are an unknown number of categories in the database. Also, the order should specified by an is_sticky and a published_date columns I left out of the examples to simplify.

Is it possible to build a query that retrieves this information?

UPDATE: I've tried the following which would seemed to work except that MySQL doesn't support the limit clause inside a subselect. Do you know of a way to simulate the limit there?

select *
  from articles a
  where a.article_id in (select f.article_id
                      from articles f
                      where f.category_id = a.category_id
                      order by f.is_sticky, f.published_at
                      limit 3)

Thanks

A: 

You probably should add another table containing the category_id and a description of the categories. Then you can query that table for a list of category IDs, and use a subquery or additional queries to get the articles with proper sorting and limiting. I don't have time to write this out fully now, but someone else probably will (or I'll do it in the unlikely event that no one else has responded by the time I get back).

kquinn
I already have a categories table but I don't quite get what you are saying. What I need to limit is the number of articles per category not the number of categories.
A: 

Here's something I'm not proud of (in MS SQL - not sure if it'll work in MySQL)

select a2.article_id, a2.category_id, a2.title
from 
    (select distinct category_id
    from article) as a1
    inner join article a2 on a2.category_id = a1.category_id

where a2.article_id <= (
    select top 1 a4.article_id 
    from (
         select top 3 a3.article_id 
         from article a3 
         where a3.category_id = a1.category_id 
         order by a3.article_id asc
    ) a4 
    order by a4.article_id desc)

It'll depend on MySQL supporting subqueries in this manner. Basically it works out the third-largest article_id for each category and joins all articles less than or equal to that per category.

SELECT TOP n * should work the same as SELECT * LIMIT n, I hope...

geofftnz
Mmmm, I need to order by two columns. Can't see a way to filter record by using the ID. Remember that the is_stiky column is boolean and can move an article up.
Yeah, sorry I missed that when I was reading the question. I'll defer to people who know MySQL, because I can really only do MSSQL (and Oracle, but that was many years ago).
geofftnz
+1  A: 

SELECT ... LIMIT isn't supported in subqueries, I'm afraid, so it's time to break out the self-join magic:

SELECT article.*
FROM article
JOIN (
    SELECT a0.category_id AS id, MIN(a2.article_id) AS lim
    FROM article AS a0
    LEFT JOIN article AS a1 ON a1.category_id=a0.category_id AND a1.article_id>a0.article_id
    LEFT JOIN article AS a2 ON a2.category_id=a1.category_id AND a2.article_id>a1.article_id
    GROUP BY id
) AS cat ON cat.id=article.category_id
WHERE article.article_id<=cat.lim OR cat.lim IS NULL
ORDER BY article_id;

The bit in the middle is working out the ID of the third-lowest-ID article for each category by trying to join three copies of the same table in ascending ID order. If there are fewer than three articles for a category, the left joins will ensure the limit is NULL, so the outer WHERE needs to pick up that case as well.

If your “top 3” requirement might change to “top n” at some point, this begins to get unwieldy. In that case you might want to reconsider the idea of querying the list of distinct categories first then unioning the per-category queries.

ETA: Ordering on two columns: eek, new requirements! :-)

It depends what you mean: if you're only trying to order the final results you can bang it on the end no problem. But if you need to use this ordering to select which three articles are to be picked things are a lot harder.

We are using a self-join with ‘<’ to reproduce the effect ‘ORDER BY article_id’ would have. Unfortunately, whilst you can do ‘ORDER BY a, b’, you can't do ‘(a, b)<(c, d)’... neither can you do ‘MIN(a, b)’. Plus, you'd actually be ordering by three columns, issticky, published and article_id, because you need to ensure that each ordering value is unique, to avoid getting four or more rows returned.

Whilst you could make up your own orderable value by some crude integer or string combination of columns:

LEFT JOIN article AS a1
ON a1.category_id=a0.category_id
AND HEX(a1.issticky)+HEX(a1.published_at)+HEX(a1.article_id)>HEX(a0.issticky)+HEX(a0.published_at)+HEX(a0.article_id)

this is getting unfeasibly ugly, and the calculations will scupper any chance of using the indices to make the query efficient. At which point you are better off simply doing the separate per-category LIMITed queries.

bobince
That's a great solution! Where would you add the order by clause? Remember that rows should be ordered by f.is_sticky and then by f.published_at.