tags:

views:

214

answers:

7

I need to construct some rather simple SQL, I suppose, but as it's a rare event that I work with DBs these days I can't figure out the details.

I have a table 'posts' with the following columns:

id, caption, text

and a table 'comments' with the following columns:

id, name, text, post_id

What would the (single) SQL statement look like which retrieves the captions of all posts which have one or more comments associated with it through the 'post_id' key? The DBMS is MySQL if it has any relevance for the SQL query.

A: 

You're basically looking at performing a subquery --

SELECT p.caption FROM posts p WHERE (SELECT COUNT(*) FROM comments c WHERE c.post_id=p.id) > 1;

This has the effect of running the SELECT COUNT(*) subquery for each row in the posts table. Depending on the size of your tables, you might consider adding an additional column, comment_count, into your posts table to store the number of corresponding comments, such that you can simply do

SELECT p.caption FROM posts p WHERE comment_count > 1

hark
Nooo! Denormilization should never be advocated on something this simple.
FlySwat
A: 

Just going off the top of my head here but maybe something like:

SELECT caption FROM posts WHERE id IN (SELECT post_id FROM comments HAVING count(*) > 0)
lomaxx
+6  A: 
select p.caption, count(c.id)
from posts p join comments c on p.id = c.post_id
group by p.caption
having count (c.id) > 0
Nick DeVore
the HAVING and COUNT is unnessesacry since he want anything that has 1 OR more comments. and use INNER JOIN for clarity, even though inner join is the default.
jishi
and GROUP BY p.id instead, since that will probably the primary key.
jishi
It should be "... > 0" rather than "... > 1", other than that it's looking good :)
Morten Christiansen
+2  A: 
SELECT DISTINCT p.caption, p.id
    FROM posts p, 
         comments c 
    WHERE c.post_ID = p.ID

I think using a join would be a lot faster than using the IN clause or a subquery.

FlySwat
but yours is better.
jishi
I'm not sure if DISTINCT would work here, or if I'd have to nest it into a subquery ie, SELECT DISTINCT Caption FROM (my query)...but I thought I'd throw it out there.
FlySwat
I believe distinct would work here since caption and p.id would be equal for each post.
jishi
We are probably right, but I don't have time to play with it on a DB. I'll remove my caveat anyways =)
FlySwat
A: 

SELECT caption FROM posts INNER JOIN comments ON comments.post_id = posts.id GROUP BY posts.id;

No need for a having clause or count().

edit: Should be a innerjoin of course (to avoid nulls if a comment is orphaned), thanks to jishi.

Tjofras
that one would give him null posts for orphaned comments, which is a bad idea. a left join would give nulls for comments. An inner join is the way to go.
jishi
A: 
SELECT DISTINCT caption
FROM posts
    INNER JOIN comments ON posts.id = comments.post_id

Forget about counts and subqueries.

The inner join will pick up all the comments that have valid posts and exclude all the posts that have 0 comments. The DISTINCT will coalesce the duplicate caption entries for posts that have more then 1 comment.

James
Does MySQL not have the inferred inner join syntax (look at my post below that is T-SQL)
FlySwat
A: 

I find this syntax to be the most readable in this situation:

SELECT * FROM posts P 
  WHERE EXISTS (SELECT * FROM Comments WHERE post_id = P.id)

It expresses your intent better than most of the others in this thread - "give me all the posts ..." (select * from posts) "... that have any comments" (where exist (select * from comments ... )). It's essentially the same as the joins above, but because you're not actually doing a join, you don't have to worry about getting duplicates of the records in Posts, so you'll just get one record per post.

Ian Varley
You have also posted the slowest query yet =)
FlySwat
Hmm ... I think that depends on the DB engine. The query isn't *inherently* slower, because it is relationally equivalent to a join. A good optimizer should run this just as fast as a join, or faster if it can avoid materializing the join results for the distinct. Course, I haven't tested it. :)
Ian Varley