views:

335

answers:

4

I have a MySQL table with many rows. The table has a popularity column. If I sort by popularity, I can get the rank of each item. Is it possible to retrieve the rank of a particular item without sorting the entire table? I don't think so. Is that correct?

An alternative would be to create a new column for storing rank, sort the entire table, and then loop through all the rows and update the rank. That is extremely inefficient. Is there perhaps a way to do this in a single query?

+4  A: 

There is no way to calculate the order (what you call rank) of something without first sorting the table or storing the rank.

If your table is properly indexed however (index on popularity) it is trivial for the database to sort this so you can get your rank. I'd suggest something like the following:

Select all, including rank

SET @rank := 0;
SELECT t.*, @rank := @rank + 1
FROM table t
ORDER BY t.popularity;

To fetch an item with a specific "id" then you can simply use a subquery as follows:

Select one, including rank

SET @rank := 0;
SELECT * FROM (
  SELECT t.*, @rank := @rank + 1
  FROM table t
  ORDER BY t.popularity
) t2
WHERE t2.id = 1;
hobodave
i think the better for rank calculation for one record will be correlated subquery based on WHERE t.popularity > t1.popularity + COUNT(*)
zerkms
correlated subqueries should _usually_ be avoided. Derived tables are _almost always_ more performant in MySQL.
hobodave
This was a good solution. I didn't know you could increment a variable by rows like that. That is a nice trick.
bigmac
+1 because this is exactly what I needed. Thank you Hobodave
Andrew Heath
A: 

You are right that the second approach is inefficent, if the rank column is updated on every table read. However, depending on how many updates there are to the database, you could calculate the rank on every update, and store that - it is a form of caching. You are then turning a calculated field into a fixed value field.

This video covers caching in mysql, and although it is rails specific, and is a slightly different form of caching, is a very similar caching strategy.

timmow
A: 

hobodave's solution is very good. Alternatively, you could add a separate rank column and then, whenever a row's popularity is UPDATEd, query to determine whether that popularity update changed its ranking relative to the row above and below it, then UPDATE the 3 rows affected. You'd have to profile to see which method is more efficient.

Brock Batsell
This is not as simple as you describe. There would quite often be _many_ affected rows. You seem to be assuming that an item's popularity would only increase +/- 1 unit, and don't account for larger increases. In fact, in the worst case of moving something to the top of the list, you'd have to update every row in the table.
hobodave
Another thing, an UPDATE will _always_ take longer than a SELECT.
hobodave
All true; I was operating on the assumption that popularity would only change incrementally.
Brock Batsell
A: 

If you are using an InnoDb table then you may consider building a clustered index on the popularity column. (only if the order by on popularity is a frequent query). The decision also depends on how varied the popularity column is (0 - 3 not so good).

You can look at this info on clustered index to see if this works for your case: http://msdn.microsoft.com/en-us/library/ms190639.aspx

This refers to SQL server but the concept is the same, also look up mysql documentation on this.

pkrish