views:

134

answers:

3

Hi folks

My problem is that I'm trying to count which tag has been used most in a table of user-submitted code. But the problem is with the database structure.

The current query I'm using is this:

SELECT tag1, COUNT(tag1) AS counttag 
FROM code 
GROUP BY tag1 
ORDER BY counttag DESC LIMIT 1

This is fine, except, it only counts the most often occurence of tag1 - and my database has 5 tags per post - so there's columns tag1, tag2, tag3, tag4, tag5. How do I get the highest occurring tag value from all 5 columns in one query?

Jack

+5  A: 

You should make 2-3 tables. I usually make 3:

code

  • id
  • code

code_tags:

  • code_id
  • tag_id

tags:

  • id
  • name

To make a query which shows the amount of tags used:

   SELECT t.*, COUNT(tag_id) as code_count 
     FROM code_tags ct 
LEFT JOIN tags t ON ct.tag_id = t.id 
 GROUP BY tag_id 
 ORDER BY code_count DESC 
    LIMIT 1
mark_dj
+1: For correct modeling and query. Formatting your answer so it's easier to read will pay dividends - see the markdown reference: http://stackoverflow.com/editing-help
OMG Ponies
and adding `count` field to `tags` table for storing the precalculated value of each tag count is acceptable too.
zerkms
Great answer, thank you very much! :)
Jack Webb-Heller
simple, clear and easy.
Vittorio Vittori
+1  A: 
Select tag, Count(tag) AS counttag 
From (
    Select tag1 As tag
    From code
    Union All
    Select tag2 As tag
    From code
    Union All
    Select tag3 As tag
    From code
    ...
) t
Group BY tag
Order BY counttag Desc Limit 1

The better approach would be to normalize your tables (also see mark_dj's answer):

Table code_tag (contains all tags of a code):

Code_id
Tag
Peter Lang
A: 

Well, you're correct that your database structure is the problem.

You could create an SQL query, which contains subqueries, each one passing over the same table and pulling out information about each tag field. However, I'd suggest two alternatives:

  1. Just make 5 different queries. Unless you specifically need this information in one SQL query (which seems unlikely), it's going to be much simpler to just make 5 requests to the DB and then organise the information programatically

  2. Fix your DB structure to use join tables. I'd hazard a guess that this is just the first of any number of problems that you're going to come across because of this DB structure.

By join tables, you're looking at 2 extra tables:

CREATE TABLE code_tags {
  code_id INT,
  tag_id INT
}

CREATE TABLE tags {
  id INT,
  other fields
}

Instead of having tagX fields in your code tables, you use the join table to add tags

Dancrumb