views:

3216

answers:

4

Is there a nice way in MySQL to replicate the MS SQL Server function ROW_NUMBER()?

For example:

SELECT 
    col1, col2, 
    ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col3 DESC) AS intRow
FROM Table1

Then I could, for example, add a condition to limit intRow to 1 to get a single row with the highest col3 for each (col1, col2) pair.

A: 

wouldn't

SELECT col1, col2
FROM Table1
ORDER BY col3 DESC
LIMIT 1

do, what you need?

Cassy
No, that would give the single row with the highest col3.I want the row with the single highest col3 for each (col1, col2) pair.
Paul
This would limit the result set to one row, not one row per (col1,col2) pair. (But it was also the first thing that came into my mind)
Felix Kling
Ok, then I misunderstood your question. Felix' answer it is then
Cassy
+4  A: 

There is no ranking functionality in MySQL. The closest you can get is to use a variable:

SELECT t.*, 
       @rownum = @rownum + 1 AS rank
  FROM TABLE t, (SELECT @rownum := 0) r


so how would that work in my case? I'd need two variables, one for each of col1 and col2? Col2 would need resetting somehow when col1 changed..?

Yes. If it were Oracle, you could use the LEAD function to peak at the next value. Thankfully, Quassnoi covers the logic for what you need to implement in MySQL.

OMG Ponies
Hmm....so how would that work in my case? I'd need two variables, one for each of col1 and col2? Col2 would need resetting somehow when col1 changed..?
Paul
Thanks...as I said above, this answer is equally accepted bobince's, but I can only tick one :-)
Paul
Assigning to and reading from user-defined variables in the same statement is not reliable. this is documented here: http://dev.mysql.com/doc/refman/5.0/en/user-variables.html: "As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement."
Roland Bouman
@Roland: I've only tested on small datasets, haven't had any issue. Too bad MySQL has yet to address the functionality - the request has been in since 2008
OMG Ponies
+4  A: 

I want the row with the single highest col3 for each (col1, col2) pair.

That's a groupwise maximum, one of the most commonly-asked SQL questions (since it seems like it should be easy, but actually it kind of isn't).

I often plump for a null-self-join:

SELECT t0.col3
FROM table AS t0
LEFT JOIN table AS t1 ON t0.col1=t1.col1 AND t0.col2=t1.col2 AND t1.col3>t0.col3
WHERE t1.col1 IS NULL;

“Get the rows in the table for which no other row with matching col1,col2 has a higher col3.” (You will notice this and most other groupwise-maximum solutions will return multiple rows if more than one row has the same col1,col2,col3. If that's a problem you may need some post-processing.)

bobince
But what if there are two maximal values of col3 for a (col1, col2) pair? You'd end up with two rows.
Paul
@Paul: yes! Just added a note about that in the answer a tic ago. You can usually easily drop unwanted extra rows in the application layer afterwards on some random basis, but if you have a *lot* of rows all with the same col3 it can be problematic.
bobince
In t-sql I tend to need this as a sub-query as part of a much larger query, so post-processing isn't really an option. Also...what if you wanted the rows with the top n highest rows values of col3? With my t-sql example, you can add the constraint of intRow <= n, but this would be very hard with a self-join.
Paul
If you took “with the single highest col3” literally you could make it return no rows instead of 2 in this case by using `>=` instead of `>`. But that's unlikely to be what you want! Another option in MySQL is to finish with `GROUP BY col1, col2` without using an aggregate expression for col3; MySQL will pick a row at random. However this is invalid in ANSI SQL and generally considered really bad practice.
bobince
For top N rows you have to add more joins or subqueries for each N, which soon gets unwieldy. Unfortunately LIMIT does not work in subqueries and there's no other arbitrary-selection-order or general windowsing function.
bobince
Thanks, yes that makes sense. In the case of multiple maxima it certainly will have to be an arbitrary row, so the GROUP BY seems logical. The extra joins or subqueries sound a bit dubious though, especially if n is variable. The choice of preferred answer is a toss-up between this and OMG Ponies', as they both will replicate the functionality I need, but in a somewhat hard-to-read, slightly hacky way.
Paul
@bobince: There's an easy solution to get the top N rows. See http://stackoverflow.com/questions/1442527/how-to-select-the-newest-four-items-per-category/1442867#1442867
Bill Karwin
@Bill Karwin: That's a nice solution. Although in this case, the column we're sorting upon isn't necessarily unique so we may get more than n values.
Paul
@Bill: nifty! What's the performance like on this sort of query, generally? Seeing heavy lifting in `HAVING` always makes me nervous. :-)
bobince
A: 
SELECT @i:=@i+1 AS iterator, t.*
FROM tablename t,(SELECT @i:=0) foo
Peter Johnson