views:

393

answers:

4

I have a table that I would like to be able to present "ranked X out of Y" data for. In particular, I'd like to be able to present that data for an individual row in a relatively efficient way (i.e. without selecting every row in the table). The ranking itself is quite simple, it's a straight ORDER BY on a single column in the table.

Postgres seems to present some unique challenges in this regard; AFAICT it doesn't have a RANK or ROW_NUMBER or equivalent function (at least in 8.3, which I'm stuck on for the moment). The canonical answer in the mailing list archives seems to be to create a temporary sequence and select from it:

test=> create temporary sequence tmp_seq;
CREATE SEQUENCE
test=*> select nextval('tmp_seq') as row_number, col1, col2 from foo;

It seems like this solution still won't help when I want to select just a single row from the table (and I want to select it by PK, not by rank).

I could denormalize and store the rank in a separate column, which makes presenting the data trivial, but just relocates my problem. UPDATE doesn't support ORDER BY, so I'm not sure how I'd construct an UPDATE query to set the ranks (short of selecting every row and running a separate UPDATE for each row, which seems like way too much DB activity to trigger every time the ranks need updating).

Am I missing something obvious? What's the Right Way to do this?

EDIT: Apparently I wasn't clear enough. I'm aware of OFFSET/LIMIT, but I don't see how it helps solve this problem. I'm not trying to select the Xth-ranked item, I'm trying to select an arbitrary item (by its PK, say), and then be able to display to the user something like "ranked 43rd out of 312."

+1  A: 

ROW_NUMBER functionality in PostgreSQL is implemented via LIMIT n OFFSET skip.

EDIT: Since you are asking for ROW_NUMBER() instead of simple ranking: row_number() is introduced to PostgreSQL in version 8.4. So you might consider to update. Otherwise this workaround might be helpful.

The Chairman
I'm aware of duplicate issues and taking precautions. This does not answer my question at all. Using LIMIT and OFFSET is easy enough, but doesn't give me a ranking number to display on the site ("this item ranked 43rd out of 312"), which is the whole point.
Carl Meyer
+2  A: 

Isn't it just this:

SELECT  *
FROM    mytable
ORDER BY
        col1
OFFSET X LIMIT 1

Or I am missing something?

Update:

If you want to show the rank, use this:

SELECT  mi.*, values[1] AS rank, values[2] AS total
FROM    (
        SELECT  (
                SELECT  ARRAY[SUM(((mi.col1, mi.ctid) < (mo.col1, mo.ctid))::INTEGER), COUNT(*)]
                FROM    mytable mi
                ) AS values
        FROM    mytable mo
        WHERE   mo.id = @myid
        ) q
Quassnoi
Beaten by 1 sec. Nevertheless you are right. +1
The Chairman
Seems it's me who's beaten: your answer's `id` is smaller by `1` :)
Quassnoi
See comment above; I don't think you understood my question. OFFSET/LIMIT is great if I want to select, say, the 12th-ranked item. But I don't. I want to select, say, the item with id 37, and display "this item ranked 43rd out of 312" on the site. I don't see how OFFSET/LIMIT helps there.
Carl Meyer
Wow, impressive. I'll try that later on and see if it fits the bill.
Carl Meyer
Wish I could accept two answers - this is an impressive query. However, I ended up deciding it would be less trouble to upgrade to 8.4 and get access to rank(), so I haven't actually tried this; feel like I should accept the answer I am actually using.
Carl Meyer
+1  A: 

If you want the rank, do something like

SELECT id,num,rank FROM (
  SELECT id,num,rank() OVER (ORDER BY num) FROM foo
) AS bar WHERE id=4

Or if you actually want the row number, use

SELECT id,num,row_number FROM (
  SELECT id,num,row_number() OVER (ORDER BY num) FROM foo
) AS bar WHERE id=4

They'll differ when you have equal values somewhere. There is also dense_rank() if you need that.

This requires PostgreSQL 8.4, of course.

Magnus Hagander
That syntax is sure a lot nicer. Maybe I'll have to consider what it would take to upgrade.
Carl Meyer
As for now, I've never encountered a problem from upgrading with a preperly designed database, while the benefits are numerous. One of my customers, however, found that with new `HashAggregate` method, `DISTINCT` doesn't necessarily sort anymore which broke some of his queries. It's him to blame, of course, but make sure your queries do not rely on these tricks.
Quassnoi
The original question specified 8.3, but I decided it was worth upgrading to 8.4 to get access to these functions. Working great, thanks for the answer!
Carl Meyer
+1  A: 

Previous replies tackle the question "select all rows and get their rank" which is not what you want...

  • you have a row
  • you want to know its rank

Just do :

SELECT count(*) FROM table WHERE score > $1

Where $1 is the score of the row you just selected (I suppose you'd like to display it so you might select it...).

Or do :

SELECT a., (SELECT count() FROM table b WHERE score > b.score) AS rank FROM table AS a WHERE pk = ...

However, if you select a row which is ranked last, yes you will need to count all the rows which are ranked before it, so you'll need to scan the whole table, and it will be very slow.

Solution :

SELECT count(*) FROM (SELECT 1 FROM table WHERE score > $1 LIMIT 30)

You'll get precise ranking for the 30 best scores, and it will be fast. Who cares about the losers ?

OK, If you really do care about the losers, you'll need to make a histogram :

Suppose score can go from 0 to 100, and you have 1000000 losers with score < 80 and 10 winners with score > 80.

You make a histogram of how many rows have a score of X, it's a simple small table with 100 rows. Add a trigger to your main table to update the histogram.

Now if you want to rank a loser which has score X, his rank is sum( histo ) where histo_score > X.

Since your score probably isn't between 0 and 100, but (say) between 0 and 1000000000, you'll need to fudge it a bit, enlarge your histogram bins, for instance. so you only need 100 bins max, or use some log-histogram distribution function.

By the way postgres does this when you ANALYZE the table, so if you set statistics_target to 100 or 1000 on score, ANALYZE, and then run :

EXPLAIN SELECT * FROM table WHERE score > $1

you'll get a nice rowcount estimate.

Who needs exact answers ?

peufeu