ansaurus

Question

Ranking in MySQL, how do I get the best performance with frequent updates and a large data set?

Answer 1

+2 A:

You need a stored procedure to be able to call this with parameters:

CREATE TABLE rank (name VARCHAR(20) NOT NULL, points INTEGER NOT NULL);

CREATE INDEX ix_rank_points ON rank(points, name);

CREATE PROCEDURE prc_ranks(fromrank INT, tillrank INT)
BEGIN
  SET @fromrank = fromrank;
  SET @tillrank = tillrank;
  PREPARE STMT FROM
  '
  SELECT  rn, rank, name, points
  FROM  (
    SELECT  CASE WHEN @cp = points THEN @rank ELSE @rank := @rn + 1 END AS rank,
            @rn := @rn + 1 AS rn,
            @cp := points,
            r.*
    FROM (
         SELECT @cp := -1, @rn := 0, @rank = 1
         ) var,
         (
         SELECT *
         FROM rank
         FORCE INDEX (ix_rank_points)
         ORDER BY
           points DESC, name DESC
         LIMIT ?
         ) r
    ) o
  WHERE rn >= ?
  ';
  EXECUTE STMT USING @tillrank, @fromrank;
END;

CALL prc_ranks (2, 5);

If you create the index and force MySQL to use it (as in my query), then the complexity of the query will not depend on the number of rows at all, it will depend only on tillrank.

It will actually take last tillrank values from the index, perform some simple calculations on them and filter out first fromrank values.

Time of this operation, as you can see, depends only on tillrank, it does not depend on how many records are there.

I just checked in on 400,000 rows, it selects ranks from 5 to 100 in 0,004 seconds (that is, instantly)

Important: this only works if you sort on names in DESCENDING order. MySQL does not support DESC clause in the indices, that means that the points and name must be sorted in one order for INDEX SORT to be usable (either both ASCENDING or both DESCENDING). If you want fast ASC sorting by name, you will need to keep negative points in the database, and change the sign in the SELECT clause.

You may also remove name from the index at all, and perform a final ORDER'ing without using an index:

CREATE INDEX ix_rank_points ON rank(points);

CREATE PROCEDURE prc_ranks(fromrank INT, tillrank INT)
BEGIN
  SET @fromrank = fromrank;
  SET @tillrank = tillrank;
  PREPARE STMT FROM
  '
  SELECT  rn, rank, name, points
  FROM  (
    SELECT  CASE WHEN @cp = points THEN @rank ELSE @rank := @rn + 1 END AS rank,
            @rn := @rn + 1 AS rn,
            @cp := points,
            r.*
    FROM (
         SELECT @cp := -1, @rn := 0, @rank = 1
         ) var,
         (
         SELECT *
         FROM rank
         FORCE INDEX (ix_rank_points)
         ORDER BY
           points DESC
         LIMIT ?
         ) r
    ) o
  WHERE rn >= ?
  ORDER BY rank, name
  ';
  EXECUTE STMT USING @tillrank, @fromrank;
END;

That will impact performance on big ranges, but you will hardly notice it on small ranges.

Quassnoi 2009-02-16 20:04:20

Looks very nice, what is the complexity of this query, is it within log(n) range and if so can you explain how come. One thing missing though is sorting on the name as 2nd priority if two rows have the same amount of points.

Per Stilling 2009-02-16 20:48:16

The algorithm my rank is counted on is that the ones with most points have rank 1 and if some people have the same amount of points I want them sorted according to their name. So if two people have max points,the row with third most points is recorded as rank 3, while the two others both have rank 1

Per Stilling 2009-02-16 20:50:21

Why "De" has rank 5 then?

Quassnoi 2009-02-16 20:56:21

Thats a typo, very sorry.

Per Stilling 2009-02-16 21:08:37

Thank you for the great response!

Per Stilling 2009-02-16 21:57:26

I did some testing, and as far as I can see the time does increase as soon as you choose small intervals in the high numbers.

Per Stilling 2009-02-16 22:11:06

I think it is due to the limit clause, are you sure it can be optimized. I thought limit was a convenience clause and wasnt optimized in any way. I did a 50 record request at offset 90000-90050 and took a second, thats a bit slow computer though, but it took 200ms getting rank 1-50.

Per Stilling 2009-02-16 22:11:51

That's exactly what I said, read my post carefully. It depends NOT on the interval itself, but on the tillrank (higher or your bounds). It cannot be further optimized, because when you need to take records from 999,900 to 1,000,000, you need to look on whole million.

Quassnoi 2009-02-16 22:28:22

And did you create the index and forced it into the query?

Quassnoi 2009-02-16 22:29:18

Ah, and ordering by name also matters. The index will work only if points and names are sorted in one direction, as MySQL does not support DESC clause in indexes. You'll need to keep NEGATIVE points in the database, and change sign in the query if you want to sort on names in ascending order.

Quassnoi 2009-02-16 22:32:56

Hmm ok, I understood this as if it didnt matter how records there were, but how many you want to fetch:"Time of this operation, as you can see, depends only on how many records you need, it does not depend on how many records are there."

Per Stilling 2009-02-16 22:34:07

I created your algorithm on my data, and created the index you also created and forced it exactly like you do. I do, however, think it should be possible to do the operation in log(n) time. I will elaborate in my question how I think it can be implemented.

Per Stilling 2009-02-16 22:39:13

Did you change the names ordering from DESC to ASC? And what MySQL version are you using?

Quassnoi 2009-02-16 22:42:45

If you use index, this operation will take LOG(n) time. N here means higher bound of your range, not the overall number of records.

Quassnoi 2009-02-16 22:43:58

Ok, I will always be using a constant range, my problem is scaling according to the total number of records. I've posted an explanation of the solution I assume could work in log(n) time. I have not changed the name ordering and I believe you that within query range the query you made work in log n

Per Stilling 2009-02-16 22:51:40

MySQL Server version: 5.0.51

Per Stilling 2009-02-16 22:56:46

See updated post. It's very strange that you get 200 ms on (1, 50), with an index this should work instantly. I'm checking on MySQL 5.1.28, maybe FORCE support is broken in 5.0.51.

Quassnoi 2009-02-16 23:01:42

Try to issue this: SELECT * FROM rank FORCE INDEX (ix_rank_points) LIMIT 10 and see how long does it take. It should return 10 your LEAST points and should take less than 10 milliseconds. If it doesn't, then it's something with FORCE INDEX .

Quassnoi 2009-02-16 23:03:48

Here is my database, maybe the fact that it is innodb makes it slower.http://pastebin.com/m655e99ea

Per Stilling 2009-02-16 23:09:05

Sorry if it's a stupid question, but did you replace "ix_rank_points" with "points2" in the query?

Quassnoi 2009-02-16 23:11:38

That query takes 0.0003 seconds

Per Stilling 2009-02-16 23:11:39

http://pastebin.com/m49c0ba13 here's the function i use

Per Stilling 2009-02-16 23:12:39

However, I figured it out why it did 200ms, it's the EMS SQL query tool I have, it adds traffic delay. So 200ms is probably instant

Per Stilling 2009-02-16 23:13:57

And how long does this function takes on (0, 50) and on (200000, 200050)?

Quassnoi 2009-02-16 23:14:34

Can you look at the pseudocode i wrote, do you know whether count and median can be made with logn or in constant time using indexes ?

Per Stilling 2009-02-16 23:15:03

Post a new question, I'll look it tomorrow

Quassnoi 2009-02-16 23:15:55

100k takes about 800ms sec, dont have more than 100k in the database at the time being. The 0-50 is instant.

Per Stilling 2009-02-16 23:19:08

Okay I will do that.

Per Stilling 2009-02-16 23:19:46

ansaurus

tags:

views:

answers:

Ranking in MySQL, how do I get the best performance with frequent updates and a large data set?

related questions