tags:

views:

455

answers:

3

I'm not even sure this is possible to do efficiently, but here's my problem:

I'm writing what's essentially a blog engine where a blog post and all replies to each blog post can tagged.

So, I could have a blog post tagged "stack", and a reply to that post tagged "overflow".

Right now, I'm trying to generate a list of the most popular tags when a user hits a special page in my application. It should return not only the n most popular tags by descending number of blog posts, but also the number of blog posts associated with each tag, even if a reply in that post but not the post itself is tagged with that tag.

So, if BlogPost A is tagged with "foo", and a reply in BlogPost B is tagged with "foo", the popular tag summary should count that as two blog posts in total, even though BlogPost B is not technically tagged.

Here's a description of the tables/fields that might be relevant:

BlogPosts
| id     # Primary key for all tables, Rails-style

BlogComments
| id
| blog_post_id

Tags
| id
| name   # 'foo'

Taggings
| id
| tag_id
| blog_post_id
| blog_comment_id

There's some denormalization in Taggings for the sake of convenience. If someone tags BlogPost, it fills in the blog_post_id field, and blog_comment_id remains NULL. If someone tags a comment to a post, it fills in both blog_post_id and blog_comment_id.

Is there some way to return a sorted list of the most popular tags in one or several SQL queries? I'm thinking I might need to just run a computationally-expensive script every few minutes on a cron job and render the cached output instead of running this every time somebody hits the page...

Thanks!

+1  A: 

So far I see nothing complicated in your request:

SELECT
  tag_id,
  COUNT(blog_post_id) + COUNT(blog_comment_id) tag_count
FROM
  Taggings
GROUP BY
  tag_id
ORDER BY
  COUNT(blog_post_id) + COUNT(blog_comment_id) DESC

If you want to count "affected blog posts" only, I think that's the way:

SELECT
  t.id    tag_id,
  t.name  tag_name,
  COUNT(DISTINCT COALESCE(x.blog_post_id, c.blog_post_id)) tag_count
FROM
  Tags                    t  
  INNER JOIN Taggings     x ON x.tag_id = t.id
  LEFT  JOIN BlogComments c ON c.id     = x.blog_comment_id
GROUP BY
  t.id,
  t.name
ORDER BY
  COUNT(DISTINCT COALESCE(x.blog_post_id, c.blog_post_id)) DESC
Tomalak
That returns a tag count of 2 when I tag a blog comment, since COUNT(blog_post_id) is 1 and COUNT(blog_comment_id) is 1.
O. Frabjous-Dey
Huh - but I think that if I take out the "+ COUNT(blog_comment_id)" parts it gives me exactly what I wanted.Cool! I guess that wasn't that bad at all. Thanks.
O. Frabjous-Dey
I thought you wanted to count both comments and posts? I may have misunderstood you.
Tomalak
Admittedly, the queries are untested and I am not entirely sure that the second one works. If it doesn't, please remove the "accepted" check mark.
Tomalak
A: 

I may be missing something obvious but since you have "If someone tags a comment to a post, it fills in both blog_post_id and blog_comment_id", the following sql should do the trick. I'm assuming here that Tags.name here will be unique.

SELECT MIN(ts.tag_id), t.name, COUNT(ts.blog_post_id) as rank
FROM Taggings ts
    INNER JOIN Tags t ON ts.tag_id = t.id
GROUP BY t.name
ORDER BY COUNT(ts.blog_post_id) DESC

Hope that's what your looking for.

Josh
A: 

I didn't tried, but what about something like this?:

select t.Id, 
 t.Name, 
 count(*)

from Taggings tings
inner join Tags t
 on (t.id = tings.blog_post_id or t.id = tings.blog_comment_id)

group by t.Id, t.Name
order by count(*) desc
Diego Jancic