tags:

views:

346

answers:

7

I'm trying to figure out which is the more efficient way to get the nth highest record in a mySQL database:

SELECT * 
FROM table_name
ORDER BY column_name DESC
LIMIT n - 1, 1

or

SELECT *
FROM table_name AS a 
WHERE n - 1 = (
    SELECT COUNT(primary_key_column) 
    FROM products b 
    WHERE  b.column_name > a. column_name)

There is an index on column_name.

I would think mySQL would efficiently perform the limit clause and the first option is the way to go.

I wasn't too clear what the 2nd query does exactly, so if that is more efficient can someone explain why.

Thanks.

+1  A: 

I think with the second query it's going to do an inner loop to run the subquery for evaluating against each row in table_name. If that is the case, this means you might have something like O(n^2) runtime.

Based on that I would personnally go with the first query, but if it were that important to me, I would do some performance testing. Make sure you test against very large data sets as well to get a good idea of how the performance scales. Something that runs at O(n) is faster for very small datasets, but something that runs at O(log(n)) is much better for large data sets.

AaronLS
O(n^2) doesnt really matter when n = 1 as is the case here :-)
James Anderson
n is the number of rows in the table, not the number of rows in the result set.
Neil Williams
Yes, I was referring to number of rows in the table since it must evaluate all of them.
AaronLS
A: 

This isn't really an answer but...

Go with the first query assuming your loads aren't super heavy, just because it works and is simple. You can always go back later and change if really necessary.

Jamie
+1  A: 

Run explain on both queries and see which one MySQL thinks is more complicated.

tpdi
+2  A: 

I tried EXPLAIN on both those queries on a database of mine (note: the optimizer may choose different plans for your schema/data) and it definitely looks like the first one wins in every regard: it's simpler to read and understand, and will most likely be faster.

As aaronls said, and EXPLAIN confirms, the second query has a correlated subquery which will require an extra iteration through the entire table for each row.

Since the first one is way easier to read, I'd choose it in a shot. If you do find that it's a bottleneck (after profiling your application), you could give the second a try but I don't see how it could possibly be faster.

Neil Williams
A: 

I would suggest (though I'm not sure about the exact SQL syntax myself) that you compute an additional RANK column on a simple query that orders elements as desired (DESC). Then just select the row where RANK = n.

You can probably do this with a variable that gets incremented I guess. It's basically something that says how many rows come before this row, so it should be very easy to compute.

Assaf Lavie
A: 

If you're really concerned about efficiency, maybe you should look into implementing a selection algorithm in SQL.

rampion
A: 

for 2nd highest record

select max(Price) as price from OrderDetails where Price<(select max(Price ) from OrderDetails)

for N th highest record

SELECT * FROM OrderDetails AS a WHERE n-1 = ( SELECT COUNT(OrderNo) FROM OrderDetails b WHERE b.Price > a. Price)