tags:

views:

418

answers:

5

I need to find the most popular occurrence of an item grouped by date and display the total of all items along with the name of this item. Is something like this possible in a single query?

note: If they are all of equal occurrence (see last 3 rows in Insert) than I can just show at random or the first or last occurrence (whichever is easiest).

If this can't be done in the sql then will have to run through the result and sort the data via PHP which seems like it'll be quite messy.

Edit: Sorry, I had the wrong total for '2009'08-04'. It should be 4.

An example of what I need:

+------------+---------------------+-------+
| date       | item                | total |
+------------+---------------------+-------+
| 2009-08-02 | Apple               |     5 |
| 2009-08-03 | Pear                |     2 |
| 2009-08-04 | Peach               |     4 |
| 2009-08-05 | Apple               |     1 |
| 2009-08-06 | Apple               |     3 |
+------------+---------------------+-------+

Here's an example table:

CREATE TABLE IF NOT EXISTS `test_popularity` (
  `date` datetime NOT NULL,
  `item` varchar(256) NOT NULL,
  `item_id` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `test_popularity` (`date`, `item`, `item_id`) VALUES
('2009-08-02 00:00:00', 'Apple', 1),
('2009-08-02 00:00:00', 'Pear', 3),
('2009-08-02 00:00:00', 'Apple', 1),
('2009-08-02 00:00:00', 'Apple', 1),
('2009-08-02 00:00:00', 'Pear', 0),
('2009-08-03 00:00:00', 'Pear', 3),
('2009-08-03 00:00:00', 'Peach', 2),
('2009-08-04 00:00:00', 'Apple', 1),
('2009-08-04 00:00:00', 'Peach', 2),
('2009-08-04 00:00:00', 'Peach', 2),
('2009-08-04 00:00:00', 'Pear', 3),
('2009-08-05 00:00:00', 'Apple', 1),
('2009-08-06 00:00:00', 'Apple', 1),
('2009-08-06 00:00:00', 'Peach', 2),
('2009-08-06 00:00:00', 'Pear', 3);
A: 

What about using GROUP BY :

select date, item_id, item, count(*) as num
from test_popularity
group by date, item_id, item
order by count(*) desc
limit 0, 1

It'll get you :

+---------------------+---------+-------+-----+
| date                | item_id | item  | num |
+---------------------+---------+-------+-----+
| 2009-08-02 00:00:00 |       1 | Apple |   3 |
+---------------------+---------+-------+-----+
1 row in set (0.01 sec)

And, without the limit clause :

mysql> select date, item_id, item, count(*) as num
    -> from test_popularity
    -> group by date, item_id, item
    -> order by count(*) desc
    -> ;
+---------------------+---------+-------+-----+
| date                | item_id | item  | num |
+---------------------+---------+-------+-----+
| 2009-08-02 00:00:00 |       1 | Apple |   3 |
| 2009-08-04 00:00:00 |       2 | Peach |   2 |
| 2009-08-04 00:00:00 |       3 | Pear  |   1 |
| 2009-08-05 00:00:00 |       1 | Apple |   1 |
| 2009-08-02 00:00:00 |       3 | Pear  |   1 |
| 2009-08-06 00:00:00 |       1 | Apple |   1 |
| 2009-08-02 00:00:00 |       0 | Pear  |   1 |
| 2009-08-06 00:00:00 |       2 | Peach |   1 |
| 2009-08-03 00:00:00 |       3 | Pear  |   1 |
| 2009-08-06 00:00:00 |       3 | Pear  |   1 |
| 2009-08-03 00:00:00 |       2 | Peach |   1 |
| 2009-08-04 00:00:00 |       1 | Apple |   1 |
+---------------------+---------+-------+-----+
12 rows in set (0.00 sec)

Of course, you will only get the number of items that correspond to the group by clause...

To get the total number, you'll need :

  • either to iterate through the result set in PHP, calculating the sum (OK if you don't have too many results)
  • or do another query.

I don't think there is a way to get both informations (number per day, and total number) in once query...

Using another query would be like this :

mysql> select item_id, item, count(*) as num
    -> from test_popularity
    -> group by item_id, item
    -> ;
+---------+-------+-----+
| item_id | item  | num |
+---------+-------+-----+
|       0 | Pear  |   1 |
|       1 | Apple |   6 |
|       2 | Peach |   4 |
|       3 | Pear  |   4 |
+---------+-------+-----+
4 rows in set (0.00 sec)

And, now, you just have to "merge" the results from both queries, in PHP.

This doesn't seem to be do-able in plain SQL... Or maybe with sub-queries, but I guess that wouldn't be too good performance-wise...


EDIT : and in only one query using a sub-query to get the total number :

mysql> select date, item_id, item, count(*) as num, (
    -> select count(*)
    -> from test_popularity sub_table
    -> where sub_table.item_id = test_popularity.item_id
    -> ) as total_num
    -> from test_popularity
    -> group by date, item_id, item
    -> order by count(*) desc;
+---------------------+---------+-------+-----+-----------+
| date                | item_id | item  | num | total_num |
+---------------------+---------+-------+-----+-----------+
| 2009-08-02 00:00:00 |       1 | Apple |   3 |         6 |
| 2009-08-04 00:00:00 |       2 | Peach |   2 |         4 |
| 2009-08-06 00:00:00 |       2 | Peach |   1 |         4 |
| 2009-08-04 00:00:00 |       1 | Apple |   1 |         6 |
| 2009-08-06 00:00:00 |       3 | Pear  |   1 |         4 |
| 2009-08-02 00:00:00 |       3 | Pear  |   1 |         4 |
| 2009-08-04 00:00:00 |       3 | Pear  |   1 |         4 |
| 2009-08-02 00:00:00 |       0 | Pear  |   1 |         1 |
| 2009-08-05 00:00:00 |       1 | Apple |   1 |         6 |
| 2009-08-03 00:00:00 |       3 | Pear  |   1 |         4 |
| 2009-08-06 00:00:00 |       1 | Apple |   1 |         6 |
| 2009-08-03 00:00:00 |       2 | Peach |   1 |         4 |
+---------------------+---------+-------+-----+-----------+
12 rows in set (0.00 sec)

With this, you get :

  • Data about each item
  • number of times each item is in the table, per day
  • and, for each item, total number of times it is in the whole table, independantly of the day
Pascal MARTIN
He want's to squash the dates together, so there should be only one of each item in the output.
Dana the Sane
In the example he provided, "Apple" is present 3 times in the output (once per day), so I believe there can be more than 1 of each item in the output -- am I wrong ?(Note that I've edited my post a couple of times, as I was trying new ideas)
Pascal MARTIN
Yes, you are wrong. He asked for it "grouped by date" and his example showed it "grouped by date". Yours doesn't.
hobodave
Ergh, yep, I am indeed wrong :-(
Pascal MARTIN
+2  A: 

My initial suggestion was incorrect:

SELECT
  date, item, SUM(cnt)
FROM (
  SELECT
    date, item, count(item_id) AS cnt
  FROM test_popularity
  GROUP BY date, item_id
  ORDER BY cnt DESC
) t
GROUP BY date;

This erroneously assumes that the outside aggregation (by date) will select the first row of the inner derived table which was ordered by cnt. This behavior is, in fact, undefined and not guaranteed to be consistent.

Here is the proper solution:

SELECT
  t1.date, t1.item, 
  (SELECT COUNT(*) FROM test_popularity WHERE date = t1.date) as total
  # see note!
FROM test_popularity t1
JOIN (
  SELECT date, item, item_id, COUNT(item_id) as count
  FROM test_popularity
  GROUP BY date, item_id
) AS t2
ON t1.date = t2.date AND t1.item_id = t2.item_id
GROUP BY t1.date;

Note:

I added the (SELECT COUNT(*)) AS total because the question asked for this in one query. However, this will not scale as it is a correlated subquery. This means that for every t1.date the SELECT COUNT(*) subquery will run. Please benchmark and see if it performs suitably for your needs. If not, then I suggest getting the daily totals in a separate query. You would merge these results in your application.

hobodave
+1 : simple, nice looking... And you understood the question ^^
Pascal MARTIN
This will give the count of the item but not the total items for that date.
kenitech
Thanks hobbodave. I tried to be as clear as possible with the outputted data.A colleague of mine just pointed out that I need another subselect to count the total for the date. I will post this and answer myself. I'm a n00b on Overflow so please forgive me.
kenitech
I didn't get around to running your updated query until just now. It is not producing the correct output. In this case,the entry for '2009-08-04' should show that 'Peach' is the most popular.
kenitech
A: 

thanks to hohodave for his initial response:

SELECT date, item, cnt, (
SELECT COUNT( * )
FROM test_popularity
WHERE date = t.date
) AS totalCnt
FROM (
SELECT date, item, count( item_id ) AS cnt
FROM test_popularity
GROUP BY date, item_id
ORDER BY cnt DESC
)t
GROUP BY date;
kenitech
+1  A: 

hobodave's answer is correct. I can't vote. I also just logged in with my account (instead of the initial cookie login) and can no longer edit anything on this post.

kenitech
You should still be able to click the checkmark next to my answer to accept it.
hobodave
I fixed your account(s) for you.
Marc Gravell
A: 

This is as close as I could get....

SELECT DISTINCT p.date, ItemTotalsByDate.Item, DateTotals.Total
    FROM test_popularity p
    INNER JOIN 
(SELECT date, MAX(cnt) DayMax from
(SELECT date, item, COUNT(*) cnt
FROM dbo.test_popularity
GROUP BY date, item) tbl
GROUP BY date) MaxesByDate
    ON p. date = MaxesByDate.date
INNER JOIN 
(SELECT date, item, COUNT(*) Total FROM dbo.test_popularity
GROUP BY date, item) ItemTotalsByDate
    ON MaxesByDate.date = ItemTotalsByDate.date AND MaxesByDate.DayMax = ItemTotalsByDate.Total
INNER JOIN
(SELECT date, COUNT(*) Total FROM dbo.test_popularity
GROUP BY date) DateTotals
ON p.date = DateTotals.date

The only thing this leaves for your PHP to do is only display the first result it finds for a given date. I couldn't figure out a good way to arbitrarily pick one item when it was a tie. Hope this helps.

thinkzig
Caveat: I did this in SQL Server cuz I didn't have a mySQL instance handy. I'm pretty sure the subquery syntax I used will work though.
thinkzig
This does not group by date. It's also needlessly complex with 4 derived tables.
hobodave