I'm not even sure this is possible to do efficiently, but here's my problem:
I'm writing what's essentially a blog engine where a blog post and all replies to each blog post can tagged.
So, I could have a blog post tagged "stack", and a reply to that post tagged "overflow".
Right now, I'm trying to generate a list of the most popular tags when a user hits a special page in my application. It should return not only the n most popular tags by descending number of blog posts, but also the number of blog posts associated with each tag, even if a reply in that post but not the post itself is tagged with that tag.
So, if BlogPost A is tagged with "foo", and a reply in BlogPost B is tagged with "foo", the popular tag summary should count that as two blog posts in total, even though BlogPost B is not technically tagged.
Here's a description of the tables/fields that might be relevant:
BlogPosts
| id # Primary key for all tables, Rails-style
BlogComments
| id
| blog_post_id
Tags
| id
| name # 'foo'
Taggings
| id
| tag_id
| blog_post_id
| blog_comment_id
There's some denormalization in Taggings for the sake of convenience. If someone tags BlogPost, it fills in the blog_post_id field, and blog_comment_id remains NULL. If someone tags a comment to a post, it fills in both blog_post_id and blog_comment_id.
Is there some way to return a sorted list of the most popular tags in one or several SQL queries? I'm thinking I might need to just run a computationally-expensive script every few minutes on a cron job and render the cached output instead of running this every time somebody hits the page...
Thanks!