tags:

views:

287

answers:

7

On my blog, I want to display the all the posts from the last month. But if that is less than 10 posts, I want to show the ten most recent posts (in other words, there should never be less than 10 posts on the front page). I am wondering if there is a way to do this in a single query?

Currently, I first run this query:

select count(*) from posts where timestamp > ($thirty_days_ago)
order by timestamp desc

If that count is greater than or equal to 10:

select * from posts where timestamp > ($thirty_days_ago)
order by timestamp desc

Otherwise:

select * from posts order by timestamp desc limit 10

But this requires me to run two queries. Is there a more efficient way to do this with a single query? (I'm using MySQL.)

+5  A: 
(SELECT * FROM posts
WHERE `timestamp` >= NOW() - INTERVAL 30 DAY)
UNION
(SELECT * FROM posts
ORDER BY `timestamp` DESC
LIMIT 10);

edit: Re @doofledorfer's comment: I ran this on my test database, and it worked fine. I tried comparing timestamp to a date literal as well as the constant expression as shown in the above query, but it made no difference to the optimization plan. Of course I was using a trivial amount of data, and the optimization plan may be different if there are thousands of rows.

In any case, the OP was asking how to get the correct result in a single query, not how to make the execution plan optimal. This is a UNION query after all, and is bound to incur a filesort.

+------+--------------+------------+------+---------------+------+---------+------+------+----------------+
| id   | select_type  | table      | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+------+--------------+------------+------+---------------+------+---------+------+------+----------------+
|  1   | PRIMARY      | posts      | ALL  | timestamp     | NULL | NULL    | NULL |   20 | Using where    | 
|  2   | UNION        | posts      | ALL  | NULL          | NULL | NULL    | NULL |   20 | Using filesort | 
| NULL | UNION RESULT | <union1,2> | ALL  | NULL          | NULL | NULL    | NULL | NULL |                | 
+------+--------------+------------+------+---------------+------+---------+------+------+----------------+
Bill Karwin
This in effect is still two queries.
Yaakov Ellis
It's one query. A UNION operation is similar to an OR condition in the WHERE clause
Eran Galperin
BUT it shouldn't include an expression in the first subquery. And your question is in effect two queries, so it must be expressed as such one way or another.
le dorfier
(Which means I agree with Bill but fix the first WHERE clause.)
le dorfier
Bill - to quote: Is there a more efficient way to do this with a single query?
le dorfier
(But I *did* agree with you, and up-click your answer. :))
le dorfier
+1  A: 

The only way that I can see it working with only one query is to do "select * from posts order by timestamp" which returns all posts, and then handle the display logic in your code. However, this is not a very efficient solution.

As long as you have your table properly indexed, then performing a select count(*) followed by a retrieval query shouldn't affect performance. Are there any special circumstances that would make you specifically try to avoid a second query? Otherwise, I think that your solution above is sufficient.

Yaakov Ellis
+1  A: 

No, there is no more efficient way. I'd do it the way you describe it in your question. Bill Karwin's answer is roughly equivalent if the predicate is revised as I've commented above.

All the other suggestions I've seen so far are much less efficient, even if they somehow return the right result.

le dorfier
A: 

I think you could try something like:

select * from posts 
where (timestamp >= (NOW() - INTERVAL 30 DAY)) or 
(post_id in (select post_id from posts order by timestamp desc limit 10))
order by timestamp desc
Jamal Hansen
Horribly inefficient, unoptimizable predicate.
le dorfier
A: 

Idea1: do a query to fetch always the posts for this month. Then do a cycle, counting the number of posts fetched. If, and only if, this number is less than 10, do the second query.

Idea 2: Why don't you cache your first query (Google App Engine, for example, has Caching APIs)? The number of posts for this month is unlikely to change often, so you'd remove the need for the first query in most of the cases.

friol
+1  A: 

Are you looking for a single table scan (e.g. one SELECT)? Or a single round-trip to the database server? Bill's answer has a single round-trip but two SELECTs... so whether that constitutes one or two "queries" depends on what you're actually looking for when you say "query".

If your latency to the database is very high, something like Bill's solution is best, because you're not waiting on communication very much. If the database itself is loaded and table scans are expensive, your original implementation may be better for two reasons:

  • You can cache the COUNT result, so it only gets executed once every 10 minutes or so. Now you've effectively amortized out the cost of that query (if 200 visitors hit that page in 10 minutes, you've only issued 201 SELECT statements).
  • A database engine can optimize the COUNT query to hit an index instead of the full table, making it much faster than trying to UNION several datasets together. I'm not sure if MySQL is sophisticated enough to do that or not.
Tom
+1  A: 

Just do this:

select * from posts order by timestamp desc limit 100

And filter the results further in memory. (assumes 100 is a practical upper limit for "posts in a month" that people would want to see in a single page)

This is a "more efficient single query".

David B
But I imagine your 100 assumption was the reason no one has suggested this strategy.
le dorfier
Sometimes correctness loses. The UI is a place where that happens more often than other places I have coded.
David B
Actually I really like this solution, except in my case 30 would be a more than sufficient upper limit (i don't foresee myself ever averaging more than a post a day).
Kip