views:

68

answers:

2

I have a large table (~1M rows now, soon ~10M) that has two ranked columns (in addition to the regular data):

  • avg_visited, a float 0-1 representing a %age popularity; higher is better
  • alexa_rank, an integer 1-N giving an a priori ranking

The a priori ranking is from external sources so can't be changed. Many rows have no popularity yet (as no user has yet hit it), so the a priori ranking is the fallback ordering. The popularity however does change very frequently - both to update old entries and to add a popularity to ones that previously only had the a priori ranking, if some user actually hits it.

I frequently run SELECT id, url, alexa_rank, avg_visited FROMsitesORDER BY avg_visited desc, alexa_rank asc LIMIT 49500, 500 (for various values of 49500).

However, ORDER BY cannot use an index with mixed ascendency per http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html

This is in mysql 5.1, innodb.

How can I best change this situation to give me a sane, fully indexed query?

+1  A: 

Unfortunately, MySQL does not support DESC clauses in the indexes, neither does it support indexes on derived expressions.

You can store the negative popularity along with the positive one and use it in the ORDER BY:

CREATE INDEX ix_mytable_negpopularity_apriori ON (neg_popularity, a_priori);

INSERT
INTO    mytable (popularity, neg_popularity)
VALUES  (@popularity, -@popularity);

SELECT  *
FROM    mytable
ORDER BY
        neg_popularity, a_priori
Quassnoi
That's the first thing I thought of too. I'm wondering if there's a nicer solution.
Sai Emrys
@Sai: it's hardly probable (with `MySQL`). In other engines, you can create an index on `(popularity DESC, a_priori)`, but not in `MySQL`.
Quassnoi
Also, FWIW, I have to use a covering index - i.e. on `(neg_popularity, a_priori, common_data_1, common_data2)` - because a simple index does not work with order by if there are other fields in the select. :-/
Sai Emrys
@Sai: a simple index works with `ORDER BY` alright. It's the optimizer that can be reluctant to use it for some reason, but you can force its usage with `FORCE INDEX`. Do you have a `LIMIT` clause in your query? Could you please post the full query?
Quassnoi
I *always* use LIMIT. Full query added to question.
Sai Emrys
@Sai: For higher values of `49500` sorting can be faster than traversing the index.
Quassnoi
+1  A: 

Just a simple hack:

Since since popularity is a float between 0 to 1. You can multiply it by -1 and the number will be between -1 to 0.

This way you can reverse the sort order of popularity to ORDER BY popularity ASC, a_priori ASC

Not sure the overhead out weighs the gain.

This reminds me of the hack of storing emails in reverse form.

Yada
This is identical to Quassnoi's, so I'm giving him the check for being first (unless someone comes up with a better solution).What hack of storing emails in reverse form? I don't know it.
Sai Emrys