tags:

views:

50

answers:

2

I am making a website where users can vote on which category a page is. They can vote that the page is in category a, b, c, or d.

I need to find the most commonly occurring category in the MySQL row out of all the votes.

Each time a user submits their vote, it submits the "category" that they voted for, and the "page_id".

I have this so far:

SELECT    page_id, category
FROM      categories
GROUP BY  page_id

I cannot use a COUNT(*) WHERE category = 'a' then repeat it for each category because there is many more categories in the actual project.

+1  A: 

something like

SELECT category, page_id, count(vote_id)
FROM categories
WHERE category in ('a', 'b', 'c', 'd')
GROUP BY category, page_id
ORDER BY count(vote_id) DESC
LIMIT 1

should do the trick. I assume here the votes are individually stored in a separate row per vote.

It only looks in the cqtegory you're interested in, sorts with the most votes first and only returns the first one.

Peter Tillemans
+1  A: 

If your table looks something like this:

SELECT * from categories;
+---------+----------+
| page_id | category |
+---------+----------+
|       1 | a        |
|       1 | b        |
|       1 | a        |
|       1 | c        |
|       1 | a        |
|       1 | b        |
|       1 | a        |
|       2 | d        |
|       2 | d        |
|       2 | c        |
|       2 | d        |
|       3 | a        |
|       3 | b        |
|       3 | c        |
|       4 | c        |
|       4 | d        |
|       4 | c        |
+---------+----------+
17 rows in set (0.00 sec)

Then you may want to try this query:

SELECT   c1.page_id, MAX(freq.total),
         (
            SELECT   c2.category
            FROM     categories c2
            WHERE    c2.page_id = c1.page_id
            GROUP BY c2.category
            HAVING   COUNT(*) = MAX(freq.total)
            LIMIT    1
         ) AS category
FROM     categories c1 
JOIN     (
            SELECT   page_id, category, count(*) total 
            FROM     categories 
            GROUP BY page_id, category
         ) freq ON (freq.page_id = c1.page_id) 
GROUP BY c1.page_id;

Which returns this:

+---------+-----------------+----------+
| page_id | MAX(freq.total) | category |
+---------+-----------------+----------+
|       1 |               4 | a        |
|       2 |               3 | d        |
|       3 |               1 | a        |
|       4 |               2 | c        |
+---------+-----------------+----------+
4 rows in set (0.00 sec)

Compare the results with the actual frequency distribution:

SELECT page_id, category, COUNT(*) FROM categories GROUP BY page_id, category;
+---------+----------+----------+
| page_id | category | COUNT(*) |
+---------+----------+----------+
|       1 | a        |        4 |
|       1 | b        |        2 |
|       1 | c        |        1 |
|       2 | c        |        1 |
|       2 | d        |        3 |
|       3 | a        |        1 |
|       3 | b        |        1 |
|       3 | c        |        1 |
|       4 | c        |        2 |
|       4 | d        |        1 |
+---------+----------+----------+
10 rows in set (0.00 sec)

Note that for page_id = 3, there is no leading frequency, in which case this query makes no guarantee on which category will be chosen in such a case.

Daniel Vassallo