views:

196

answers:

4

Hello,

In MySQL I have a table called "meanings" with three columns:

  • "person" (int),
  • "word" (byte, 16 possible values)
  • "meaning" (byte, 26 possible values).

A person assigns one or more meanings to each word:

person word meaning
-------------------
1      1    4
1      2    19
1      2    7  <-- Note: second meaning for word 2
1      3    5
...
1      16   2

Then another person, and so on. There will be thousands of persons.

I need to find for each of the 16 words the top three meanings (with their frequencies). Something like:

+--------+-----------------+------------------+-----------------+
| Word   | 1st Most Ranked | 2nd Most Ranked  | 3rd Most Ranked |
+--------+-----------------+------------------+-----------------+
| 1      | meaning 5 (35%) | meaning 19 (22%) | meaning 2 (13%) |
| 2      | meaning 8 (57%) | meaning 1 (18%)  | meaning 22 (7%) |
+--------+-----------------+------------------+-----------------+

...

Is it possible to solve this with a single MySQL query?

A: 

Of course you can do

 SELECT * FROM words WHERE word = 2 ORDER BY meaning DESC LIMIT 3

But this is cheating since you need to create a loop. Im working on a better solution

Henri
Indeed, I was looking for a single-query solution...
Amenhotep
+1  A: 

Well, if you group by word and meaning, you can easily get the % of people who use each word/meaning combination out of the dataset.

In order to limit the number of meanings for each word returned, you will need create some sort of filter per word/meaning combination.

Seems like you just want the answer to your homework, so I wont post more than this, but this should be enough to get you on the right track.

NickLarsen
For the first part, google how to make a frequency distribution query. For the second part, join the frequency distribution query with your filter query.
NickLarsen
+1 for homework(?)
rlb.usa
A: 

I believe the problem I had a while ago looks similar. I ended up with the @counter thing.

Savageman
A: 

Note about the problem

Let's suppose there is only one person, who says:

+--------+----------------+
| Person | Word | Meaning |
+--------+----------------+
| 1      | 1    | 7       |
| 1      | 1    | 3       |
| 1      | 2    | 8       |
+--------+----------------+

The report should read:

+--------+------------------+------------------+-----------------+
| Word   | 1st Most Ranked  | 2nd Most Ranked  | 3rd Most Ranked |
+--------+------------------+------------------+-----------------+
| 1      | meaning 7 (100%) | meaning 3 (100%) | NULL            |
| 2      | meaning 8 (100%) | NULL             | NULL            |
+--------+------------------+------------------+-----------------+

The following is not OK (50% frequency is absurd in a population of one person):

+--------+------------------+------------------+-----------------+
| Word   | 1st Most Ranked  | 2nd Most Ranked  | 3rd Most Ranked |
+--------+------------------+------------------+-----------------+
| 1      | meaning 7  (50%) | meaning 3  (50%) | NULL            |
| 2      | meaning 8 (100%) | NULL             | NULL            |
+--------+------------------+------------------+-----------------+

The intended meaning of the frequencies is "How many people think this meaning corresponds to that word"?

So it's not merely about counting "cases", but about counting persons in the table.

Amenhotep