views:

53

answers:

2

Hello,

I have in my database a table Articles with Article Name and Article Tags. Article tags is a string column with tags in it like this: "people, life, president". Every article is tagged in this fashion.

Now i would like to get 10 most popular tags for whole population of articles. How to do this?

+7  A: 

If you have the possibility, better change your database schema to have an article table, a tag table and an article_tags table.

This would allow for much more efficient matching of tags (like the ones you want to do), especially if you have many articles.

With your current design, you're stuck to fully iterating over the (presumably big) articles table, and maintaining a heap of tags (either through a temporary table - in which case you could do it via a stored procedure - or in applicative code). The whole would be really unefficient.

Zorglub
+1  A: 

It would be a lot easier if you normalized the database and created a separate tag table. For example, if you have a database like this:

article
    article_id
    article_title
    article_content

article_tag
    tag_name
    article_id
    UNIQUE INDEX (tag_name, article_id)
    UNIQUE INDEX (article_id, tag_name) # in order to perform fast lookups in both directions

(This could possibly be normalized even further by creating a tag-table that contains the tag_name and a tag_id, and replacing tag_name with tag_id in article_tag.)

Now you can present a list of the most popular tags using a query such as:

SELECT tag_name, count(article_id) c 
  FROM article_tag
  GROUP BY tag_name
  ORDER BY c DESC
  LIMIT 10
Emil H
You've got it backwards - the current state of the data is denormalized. Changing it to 3NF would reflect the tables you suggest.
OMG Ponies
Yep. My bad. I honestly can't say what I was thinking.
Emil H