tags:

views:

190

answers:

4

I have a table called trends_points, this table has the following columns:

  • id (the unique id of the row)
  • userId (the id of the user that has entered this in the table)
  • term (a word)
  • time (a unix timestamp)

Now, I'm trying to run a query on this table which will get the rows in a specific time frame ordered by how many times the column term appears in the table during the specific timeframe...So for example if the table has the following rows:

id | userId | term        | time
------------------------------------
1    28       new year      1262231638
2    37       new year      1262231658
3    1        christmas     1262231666
4    34       new year      1262231665
5    12       christmas     1262231667
6    52       twitter       1262231669

I'd like the rows to come out ordered like this:

new year
christmas
twitter

This is because "new year" exists three times in the timeframe, "christmas" exists twice and "twitter" is only in one row.

So far I've asummed it's a simple WHERE for the specific timeframe part of the query and a GROUP BY to stop the same term from coming up twice in the list.

This makes the following query:

SELECT * 
  FROM `trends_points` 
 WHERE ( time >= <time-period_start> 
  AND time <= <time-period_end> ) 
GROUP BY `term`

Does anyone know how I'd do the final part of the query? (Ordering the query's results by how many rows contain the same "term" column value..).

+1  A: 

COUNT() will give you the number of rows in the group, so just order by that.

SELECT * FROM `trends_points`
WHERE ( `time` >= <time-period_start> AND `time` <= <time-period_end> )
ORDER BY COUNT(`term`) DESC
GROUP BY `term`
Jordan
Perfect, thanks a lot!
Simon
Great! Also, consider OMG Ponies' note about BETWEEN.
Jordan
+1  A: 

Quick answer:

SELECT 
   term, count(*) as thecount
FROM 
   mytable
WHERE
   (...)
GROUP BY 
   term
ORDER BY 
   thecount DESC
gahooa
Brilliant, didn't realize I could use a COUNT() in an order by. Thanks a load!
Simon
+2  A: 
SELECT t.term
FROM trend_points t
WHERE t.time >= <time-period_start> AND t.time <= <time-period_end>
ORDER BY COUNT(t.term) DESC
GROUP BY t.term
danielrsmith
+2  A: 

Use:

  SELECT tp.term,
         COUNT(*) 'term_count'
    FROM TREND_POINTS tp
   WHERE tp.time BETWEEN <time-period_start> AND <time-period_end> 
GROUP BY tp.term
ORDER BY term_count DESC, tp.term

See this question about why to use BETWEEN vs using the >=/<= operators.

Keep in mind there can be ties - the order by defaults to alphabetically shorting by term value when this happens, but there could be other criteria.

Also, if you want to additionally limit the number of rows/terms coming back you can add the LIMIT clause to the end of the query. For example, this query will return the top five terms:

  SELECT tp.term,
         COUNT(*) 'term_count'
    FROM TREND_POINTS tp
   WHERE tp.time BETWEEN <time-period_start> AND <time-period_end> 
GROUP BY tp.term
ORDER BY term_count DESC, tp.term
   LIMIT 5
OMG Ponies