tags:

views:

438

answers:

7

I have 3 tables in my DB (MySQL).

categories (name:string)
items (name:string, category_id:int)
votes (value:int, item_id:int, created_at:datetime)

So a category has many items, and an item has many votes.

I want to write a query to get the most popular categories, which means getting the categories whose items have gotten the most number of votes (both up or down) over the last week.

I started off trying something simpler, just getting popular items, but I'm really just guessing at this point and it doesn't work.

SELECT *, COUNT(votes.item_id) AS score
FROM items
JOIN votes USING(item_id)
WHERE votes.created_at > #{1.week.ago}
ORDER BY COUNT(votes.item_id) DESC LIMIT 5;

I don't really know what I'm doing, any ideas? Also, if anyone knows of a good write up on doing more advanced selects like this I'd love to read it. The MySQL documentation is a bit cryptic and I don't really understand 'AS' and 'JOINS'.

+2  A: 

try this. use group by with the name of the category. i have commented out the created at clause as you specified, you can uncomment it if you want to use it.

 SELECT c.name, SUM(ABS(v.item_id)) 
 FROM categories c,items i, votes v
 WHERE c.name = i.name
    AND i.item_id=v.item_id
    --AND v.created_at > #{1.week.ago}
 GROUP BY c.name 
 ORDER BY SUM(ABS(v.item_id)) DESC LIMIT 5;

you will notice that i did not use the JOIN keyword but instead filtered the results of the query using only WHERE clauses, which might be easier to understand. if you want to learn more about JOINs, here is a tutorial.

Here, too, is a tutorial on SQL aliases (the AS clause). in fact, there are a bunch more tutorials on this site for a bunch of different SQL topics that are not platform dependent.

edit: fixed as per comments, added the abs function,

akf
your order by needs a target
Hardwareguy
this solution seems to make the most sense to me, thanks!
Brian Armstrong
A: 
SELECT c.name, sum(v.value) as cnt
 FROM categories c
 JOIN items i ON i.category_id = c.id
 JOIN votes v ON v.item_id = i.id
 WHERE v.created_at > #{1.week.ago}
 GROUP BY c.name 
 ORDER BY cnt DESC LIMIT 5;

Edit: good point andrew, I fixed the query

Hardwareguy
this does not take into account the idea that the value of the vote could be greater than one...or a negative value
Andrew Siemer
add a having clause then
Egwor
A: 

This is in SQL Server...but easy to convert for a MySQL guy I would think!

select top 5
    c.name as CategoryName, sum(v.value) as VoteSum
from categories c inner join items i
    on c.category_id = i.category_id
    inner join votes v
     on i.item_id = v.item_id
where created_at between dateadd(week,-1,getdate()) and getdate()
group by c.name
order by sum(v.value) desc

select top 5 --number of categories to show in calculation

c.name as CategoryName, sum(v.value) as VoteSum --get the categoryname and the sum of votes

from categories c inner join items i --join the heirarchy: categories to items

on c.category_id = i.category_id --on category_id

inner join votes v --items to votes

on i.item_id = v.item_id --on item_id

where created_at between dateadd(week,-1,getdate()) and getdate() --specify the date range to include

group by c.name --group the results by the category

order by sum(v.value) desc --order the results by the summed value

Here is some sql (from SQL Server) to get the tables up and running:

CREATE TABLE [categories](
    [category_id] [int] IDENTITY(1,1) NOT NULL,
    [name] [varchar](50) NOT NULL
)

CREATE TABLE [items](
[item_id] [int] IDENTITY(1,1) NOT NULL,
[category_id] [int] NOT NULL,
[name] [varchar](50)
)

CREATE TABLE [dbo].[votes](
[vote_id] [int] IDENTITY(1,1) NOT NULL,
[value] [int] NOT NULL,
[item_id] [int] NOT NULL,
[created_at] [datetime] NOT NULL
)

insert into categories (name) values (' asp.net ')
insert into categories (name) values (' c# ')
insert into categories (name) values (' vb ')
insert into categories (name) values (' sql ')
insert into categories (name) values (' html ')
insert into categories (name) values (' javascript ')

insert into items (category_id, name) values (  1 ,' session handling ')
insert into items (category_id, name) values (  1 ,' mvc vs mvp ')
insert into items (category_id, name) values (  1 ,' code behind or no code behind ')
insert into items (category_id, name) values (  2 ,' LINQ? ')
insert into items (category_id, name) values (  2 ,' lamdas ')
insert into items (category_id, name) values (  2 ,' multi-threaded code ')
insert into items (category_id, name) values (  2 ,' SOLID principles ')
insert into items (category_id, name) values (  3 ,' vb vs C# ')
insert into items (category_id, name) values (  3 ,' VB.NET over vb6 ')
insert into items (category_id, name) values (  4 ,' CLR procedures or stored procedures ')
insert into items (category_id, name) values (  4 ,' ORMs vs stored procedures and views ')
insert into items (category_id, name) values (  6 ,' jquery instead of standard DOM ')

insert into votes (value, item_id, created_at) values ( -1 , 1 ,' 6/26/2009 18:59 ')
insert into votes (value, item_id, created_at) values ( 1 , 1 ,' 6/26/2009 18:59 ')
insert into votes (value, item_id, created_at) values ( 3 , 1 ,' 6/26/2009 18:59 ')
insert into votes (value, item_id, created_at) values ( 3 , 1 ,' 6/26/2009 18:59 ')
insert into votes (value, item_id, created_at) values ( 2 , 2 ,' 6/26/2009 18:59 ')
insert into votes (value, item_id, created_at) values ( 2 , 2 ,' 6/26/2009 18:59 ')
insert into votes (value, item_id, created_at) values ( 4 , 2 ,' 6/26/2009 18:59 ')
insert into votes (value, item_id, created_at) values ( -3 , 2 ,' 6/26/2009 18:59 ')
insert into votes (value, item_id, created_at) values ( 2 , 4 ,' 6/26/2009 19:00 ')
insert into votes (value, item_id, created_at) values ( 6 , 4 ,' 6/26/2009 19:00 ')
insert into votes (value, item_id, created_at) values ( 3 , 4 ,' 6/26/2009 19:00 ')
insert into votes (value, item_id, created_at) values ( 5 , 4 ,' 6/26/2009 19:00 ')
insert into votes (value, item_id, created_at) values ( 8 , 7 ,' 6/26/2009 19:00 ')
insert into votes (value, item_id, created_at) values ( 3 , 6 ,' 6/26/2009 19:00 ')
insert into votes (value, item_id, created_at) values ( 8 , 7 ,' 6/26/2009 19:01 ')
insert into votes (value, item_id, created_at) values ( 2 , 5 ,' 6/26/2009 19:01 ')
Andrew Siemer
he specifies mysql, isn't TOP 5 a sql server construct?
Hardwareguy
ye top is mssql. Just take off the top 5 and add limit 5 at the bottom.
corymathews
thanks for the detailed writeup Andrew I appreciate it!
Brian Armstrong
A: 

You could probably use group by in this case and remove the join. I always screw up when using group by but Something like

SELECT COUNT(votes.item_ID) AS score, 
  (SELECT ItemTitle FROM items WHERE items.item_id = votes.item_id) as Title
FROM votes
WHERE votes.created_at > #{1.week.ago}
Group By Title
Order By score
Limit 5

AS

"as" allows you to give something a name.

Notice above the as score, this gives the result from count(votes.item_id) the column name of score since it did not have a column name before. You could also use this if you want to call something by another name in the rest of the query.

If you took off the as score it would come back as a column with no title and no way to access it by name, only by number.

JOIN

A join will merge 2 tables as 1 temp table and return this table. There are inner, outer, left, right and cross joins. Each with its own advantages but all have the same problem of being slow. Look into sub queries to replace most joins.

You will also want to avoid using Select *, list out all the things you need.

The best way to figure these out is to just run them all and see what they return and read what they are supposed to do w3Schools Joins

corymathews
count should probably be sum, and as @Paul Chernoch points out, an abs is probably worthwhile
akf
thanks for the detailed response, I appreciate it!
Brian Armstrong
A: 

1) I assume the categories table also has category_id field, and the items table has an item_id or you won't be able to link the tables. 2) AS keyword is often optional

SELECT c.name, SUM(ABS(v.value))
FROM categories c
INNER JOIN items i ON c.category_id = i.category_id
INNER JOIN votes v ON v.item_id = i.item_id
WHERE v.created_at BETWEEN DATE_SUB(NOW(), INTERVAL -7  DAYS) AND NOW()
GROUP BY c.name
ORDER BY 2 DESC
LIMIT 10
  • The AS keyword is for making a synonym of the table name. Most times this is just a shorthand, but if you perform self-joins where the table is joined to itself, you need to distinguish them. And if two tables have the same field name, you need to specify which table's field you are using, thus c.category_id comes from the "c" table, which means the category table.
  • JOINS are essential. Start reading.
  • In my solution, I used DATE_SUB which is native to mySQL. I do not know how many other databases use that function, but all of them have something similar.
  • My query gives you the top ten categories with the most popular first. Note that the LIMIT N clause is how you do it in mySQL. In SQLServer, use TOP N after the SELECT keyword. In Oracle, it is done another way.
  • I took the absolute value of votes because you said include up or down votes.
  • ORDER BY 2 DESC means sort descending by the second column in the select. You can use the actual expression here, but that is more typing.
  • GROUP BY is necessary. Every column that is not a constant or aggregated with SUM, COUNT, MAX, etc. must appear in the GROUP BY clause if there are any aggregate functions used.
Paul Chernoch
regarding the TOP/LIMIT equivalent in Oracle: it is ROWNUM as in ROWNUM < 10
akf
A: 

This is what I ended up using (slightly modified version of @akf's answer). The other answers were great also, but this one seemed the simplest to me as a novice. I find it interesting that it doesn't use any joins? Would not have thought this was possible. I like how clean and simple it is.

SELECT c.*, SUM(ABS(v.vote)) AS score
FROM categories c,items i, votes v
  WHERE c.id = i.category_id
  AND i.id = v.item_id
  AND v.created_at > '#{1.week.ago}'
GROUP BY c.id
ORDER BY score DESC LIMIT 5;

(I forgot to mention in the question that each table has an 'id' column, which you can see being used here.)

Summing the absolute value of the vote value works in this case since there are positives and negatives as some posters correctly pointed out, but I was thinking it would be even simpler to just count the number of vote records in the database. I tried "COUNT(v) AS score" but this didn't work. If anyone knows how please post a comment. Thanks everyone!

Brian Armstrong
A: 

Sorry, my mistake, please ignore.

Chris Denman