views:

2383

answers:

4

I have an oracle database populated with million records. I am trying to write a SQL query that returns the first 'N" sorted records ( say 100 records) from the database based on certain condition.

SELECT * 
FROM myTable 
Where SIZE > 2000 
ORDER BY NAME DESC

Then programmatically select first N records.

The problem with this approach is :

  • The query results into half million records and "ORDER BY NAME" causes all the records to be sorted on NAME in the descending order. This sorting is taking lot of time. (nearly 30-40 seconds. If I omit ORDER BY, it takes only 1 second).
  • After the sort I am interested in only first N (100) records. So the sorting of complete records is not useful.

My questions are:

  1. Is it possible to specify the 'N' in query itself? ( so that sort applies to only N records and query becomes faster).
  2. Any better way in SQL to improve the query to sort only N elements and return in quick time.
+1  A: 

Add this:

 AND rownum <= 100

to your WHERE-clause.

However, this won't do what you're asking.

If you want to pick 100 random rows, sort those, and then return them, you'll have to formulate a query without the ORDER BY first, then limit that to 100 rows, then select from that and sort.

This could work, but unfortunately I don't have an Oracle server available to test:

SELECT *
FROM (
    SELECT *
    FROM myTable
    WHERE SIZE > 2000
      AND rownum <= 100
    ) x
ORDER BY NAME DESC

But note the "random" part there, you're saying "give me 100 rows with SIZE > 2000, I don't care which 100".

Is that really what you want?

And no, you won't actually get a random result, in the sense that it'll change each time you query the server, but you are at the mercy of the query optimizer. If the data load and index statistics for that table changes over time, at some point you might get different data than you did on the previous query.

Lasse V. Karlsen
Thanks for the answer. My query is not to get random 100. I want to get first 100 sorted records. For ex: if the records are 1,5,8,2,14,3,6,7. AND If I want 3 records then the answer would be (1,2,3)
aJ
Then you *do* want them sorted first, and if sorting your million rows takes a lot of time, that won't help much. All you do is avoid retrieving all the rows over the network, the sort still has to run.
Lasse V. Karlsen
However, Oracle is smart enough to keep the top 100 results. If the next row is outside that 100, it discards it. In this way, it does not have to sort the entire thing. This is O(n) instead of O(n log n)
WW
+12  A: 

Hi aJ,

If your purpose is to find 100 random rows and sort them afterwards then Lasse's solution is correct. If as I think you want the first 100 rows sorted by name while discarding the others you would build a query like this:

SELECT * 
  FROM (SELECT * 
          FROM myTable 
         WHERE SIZE > 2000 ORDER BY NAME DESC) 
 WHERE ROWNUM <= 100

The optimizer will understand that it is a TOP-N query and will be able to use an index on NAME. It won't have to sort the entire result set, it will just start at the end of the index and read it backwards and stop after 100 rows.

You could also add an hint to your original query to let the optimizer understand that you are interested in the first rows only. This will probably generate a similar access path:

SELECT /*+ FIRST_ROWS*/* FROM myTable WHERE SIZE > 2000 ORDER BY NAME DESC

Edit: just adding AND rownum <= 100 to the query won't work since in Oracle rownum is attributed before sorting : this is why you have to use a subquery. Without the subquery Oracle will select 100 random rows then sort them.

Vincent Malgrat
+5  A: 

This shows how to pick the top N rows depending on your version of Oracle.

From Oracle 9i onwards, the RANK() and DENSE_RANK() functions can be used to determine the TOP N rows. Examples:

Get the top 10 employees based on their salary

SELECT ename, sal FROM ( SELECT ename, sal, RANK() OVER (ORDER BY sal DESC) sal_rank FROM emp ) WHERE sal_rank <= 10;

Select the employees making the top 10 salaries

SELECT ename, sal FROM ( SELECT ename, sal, DENSE_RANK() OVER (ORDER BY sal DESC) sal_dense_rank FROM emp ) WHERE sal_dense_rank <= 10;

The difference between the two is explained here

IanH
A: 

Your problem is that the sort is being done every time the query is run. You can eliminate the sort operation by using an index - the optimiser can use an index to eliminate a sort operation - if the sorted column is declared NOT NULL.

(If the column is nullable, it is still possible, by either (a) adding a NOT NULL predicate to the query, or (b) adding a function-based index and modifying the ORDER BY clause accordingly).

Jeffrey Kemp