views:

57

answers:

3

This question asks how to select a user's rank by his id.

id     name     points
1      john     4635
3      tom      7364
4      bob      234
6      harry    9857

The accepted answer is

SELECT  uo.*, 
        (
        SELECT  COUNT(*)
        FROM    users ui
        WHERE   (ui.points, ui.id) >= (uo.points, uo.id)
        ) AS rank
FROM    users uo
WHERE   id = @id

which makes sense. I'd like to understand what the performance tradeoffs would be, between this approach, or by modifying the db structure to store a calculated rank (I guess that would require massive changes every time there's a rank change), or any other approaches that I'm too newb to think of. I'm a db noob.

+1  A: 

The performance tradeoff would basically be what you described:

If you modified the structure to store a rank, queries would be very, very simple and fast. However, this would require some overhead any time "points" changed, as you'd have to verify that the rank hasn't changed. If the ranking had changed, you'd have to do multiple updates.

This causes more work (with the potential for bugs) at every update/insert. The tradeoff is very fast reads. If you're typical usage is very few modifications compared to millions of reads, AND you found this query to be a bottleneck, it might be worth considering reworking this. However, I would avoid the added complexity and maintainability headaches unless you truly found this to be a problem, since the current solution requires less storage, and is very flexible.

Reed Copsey
A: 

Does not the 'where' portion of that query internally require reading the entire table? I understand about premature optimization. Academically, it seems that this wouldn't scale further than a few thousand rows.

Dustin Getz
+1  A: 

The link you reference is a MySQL question. If the original database had been Oracle the accepted answer would be to use an analytic function, which does scale, very nicely:

SQL> select id, name, points from users order by id
  2  /

        ID NAME           POINTS
---------- ---------- ----------
         1 john             4635
         3 tom              7364
         4 bob               234
         6 harry            9857
         8 algernon            1
         9 sebastian         234
        10 charles           888

7 rows selected.

SQL> select name, id, points, rank() over (order by points)
  2  from users
  3  /

NAME               ID     POINTS RANK()OVER(ORDERBYPOINTS)
---------- ---------- ---------- -------------------------
algernon            8          1                         1
bob                 4        234                         2
sebastian           9        234                         2
charles            10        888                         4
john                1       4635                         5
tom                 3       7364                         6
harry               6       9857                         7

7 rows selected.

SQL> select name, id, points, dense_rank() over (order by points desc)
  2  from users
  3  /

NAME               ID     POINTS DENSE_RANK()OVER(ORDERBYPOINTSDESC)
---------- ---------- ---------- -----------------------------------
harry               6       9857                                   1
tom                 3       7364                                   2
john                1       4635                                   3
charles            10        888                                   4
bob                 4        234                                   5
sebastian           9        234                                   5
algernon            8          1                                   6

7 rows selected.

SQL>
APC