tags:

views:

390

answers:

7

Let's say I have two tables, news and comments.

news (
  id,
  subject,
  body,
  posted
)

comments (
  id,
  parent,  // points to news.id
  message,
  name,
  posted
)

I would like to create one query that grabs the latest x # of news item along with the name and posted date for the latest comment for each news post.

Speed matters in terms of selecting ALL the comments in a subquery is not an option.

+1  A: 

If speed is that important, why not create a recent_comment table that contains the id and parent id of just the most recent comments? Every time a comment is posted on a news post, replace that news id's most recent comment id. Create an index on the news id column of the new table and your joins will be fast.

You'd be trading write speed for read speed, but not by a whole lot.

Welbog
Why this would work in some cases, in my specific case that i'm working on the latest comment may not be the same for everyone. I simplified the problem so I can get real code samples back that are manageable.
Chris Bartow
Ah, I see. Well, chances are you're not going to get a very efficient query out of this beyond what you've probably already figured out. Can one make the assumption that in the comments table, the id column is increasing with time? As in, for the comment with max(posted), does it also have max(id)?
Welbog
max(id) and max(posted) where parent = x should return the same results.
Chris Bartow
Because if so you can have a query likeSELECT MAX(id), MAX(posted), parentFROM commentsGROUP BY parentWhich would be efficient depending on what indexes you have defined on the comments table.
Welbog
But I need the name field and MAX(name) won't work. :)
Chris Bartow
OK, I'll explain what I mean in another answer for you. Give me a minute, and know that it might not work for you at all...
Welbog
+1  A: 

Assuming posted is a unique timestamp, otherwise choose a unique autonumber

select c.id, c.parent, c.message, c.name, c.posted

  c.message, c.name, 

  c.posted -- same as comment_latest.recent

from comments c
join
(
    select parent, max(posted) as recent
    from comments
    group by parent
) as comment_latest
on c.parent = comment_latest.parent
and c.posted = comment_latest.recent


Complete(displays news information):

select 

  n.id as news_id, n.subject, n.body, n.posted as news_posted_date

  c.id as comment_id, 
  c.message, c.name as commenter_name, c.posted as comment_posted_date

from comments c
join
(
    select r.parent, max(r.posted) as recent
    from comments r
    join 
    (
       select id from news order by id desc limit $last_x_news
    ) news l
    on r.parent = l.id
    group by r.parent
) as comment_latest
on c.parent = comment_latest.parent
and c.posted = comment_latest.recent

join news n on c.parent = n.id


NOTE:

The above code is not subquery, it is table-deriving query. It is faster than subquery. This is subquery(slow):

select 
  id,
  subject,
  body,
  posted as news_posted_date,
  (select id from comments where parent = news.id order by posted desc limit 1) as comment_id,
  (select message from comments where parent = news.id order by posted desc limit 1) as message,
  (select name from comments where parent = news.id order by posted desc limit 1) as name,
  (select posted from comments where parent = news.id order by posted desc limit 1) as comment_posted_date,
from news
Michael Buen
This doesn't work unless comment.posted is guaranteed to be unique for a given news post. In my experience making this assumption with datetime columns is a bad idea.
Welbog
This doesn't pull the name field correctly (just random) and it selects ALL in the subquery which means it took 50 seconds to run.
Chris Bartow
A: 

Given the constraints brought to light in the comments of my other answer, I have a new idea that may or may not make any sense in practise.

Create a view (or function if it's more appropriate) with the following definition, called recent_comments:

SELECT MAX(id), parent
FROM comments
GROUP BY parent

If you have a clustered index on the parent column, this is probably a reasonably fast query, but even then it will still be a bottleneck.

Using this, the query you need to get your answer is something like,

SELECT news.*, comments.*
FROM news
INNER JOIN recent_comments
ON news.id = recent_comments.parent
INNER JOIN comments
ON comments.id = recent_comments.id

Plus considerations for news posts that don't have any comments yet.

Welbog
This is similar to Michael's answer in that the subquery would select ALL of the comments and on a large data set it's just too slow.
Chris Bartow
Yeah, I see that. When you need to find the "biggest" record out of a set of records, you need to search all of the records. If you had a clustered index on both (parent,id) in the comments table, then depending on MySQL's optimizer it might know to grab the last comment in the index, but I doubt it
Welbog
+1  A: 
SELECT news.subject, news.body, comments.name, comments.posted
FROM news
INNER JOIN comments ON
(comments.parent = news.id)
WHERE comments.parent = news.id
AND comments.id = (SELECT MAX(id)
FROM comments
WHERE parent = news.id)
ORDER BY news.id

This gets all the news items, along with the related comment with the highest id value, which in theory should be the latest.

_J_
So my accepted answer was later rejected because someone took it and changed the join?
_J_
Your answer ended up not including news posts without comments. So new news would never show up.
Chris Bartow
I agree but I led you in the right direction. Disappointing to be accepted then rejected after someone took my work and changed the join.
_J_
+1  A: 

My solution is similar to J but I think he added one line that is unnecessary:

SELECT news.*, comments.name, comments.posted FROM news INNER JOIN comments ON news.id = comments.parent WHERE comments.id = (SELECT max(id) FROM comments WHERE parent = news.id )

Not sure of the speed on an extremely large table though.

Randy Allen
Yes that works just as well.
_J_
+2  A: 

I just realized the query does not return results if there are no comments attached to the news table, here's the fix as well as an added column for the total # of posts:

SELECT news.*, comments.name, comments.posted, (SELECT count(id) FROM comments WHERE comments.parent = news.id) AS numComments
FROM news
LEFT JOIN comments
ON news.id = comments.parent
AND comments.id = (SELECT max(id) FROM comments WHERE parent = news.id)
Randy Allen
A: 

Hi guys! i think the solution provided by "Jan" is the best i.e create the "View" and inner join it with the sql statement. It'll definitely reduce the time to pull the data, i tested it and it works 100%.

Adil