views:

544

answers:

4

This question answers the question on how to select a random sample from oracle which is exactly what I need. I do not understand however the difference between that solution

SELECT  *
FROM    (
        SELECT  *
        FROM    mytable
        ORDER BY
                dbms_random.value
        )
WHERE rownum <= 1000

and something like

select * from mytable where rownum<=1000 order by dbms_random.value

When I query using the first method, it takes a long time (still hasn't finished) but when I query using the 2nd method, it's very quick but the results don't seem to be random.

Appreciate and advice/direction y'all can provide.

Thanks!

JC

+1  A: 

In Oracle, ORDER BY is evaluated after ROWNUM.

This query:

SELECT  id, ROWNUM
FROM    (
        SELECT  NULL AS id
        FROM    dual
        UNION ALL
        SELECT  1 AS id
        FROM    dual
        )
ORDER BY
        id

will retrieve the following:

  id    rownum
----    ------
   1         2
NULL         1

Your first query first orders values by random, the selects the first thousand records, which takes long.

The second query first selects 1000 records, then sorts them in random order, which is of course faster but the results are not random.

Quassnoi
+6  A: 

Oracle selects the rows based on the criteria before any sorting takes place. Therefore, your second query can be read as:

  1. Select the first 1000 rows from mytable
  2. Sort these 1000 rows by random value

Therefore, you will always be getting the same 1000 rows, just in a random order. The first query forces Oracle to sort all rows randomly first:

  1. Sort all rows by random value
  2. Select the first 1000 of these randomly ordered rows
Adam Paynter
A: 

The second one will return 1000 records and order them at random. In the first query it is taking more time because it is ordering all the records and then extracting 1000 thouse that ended randomly in the first 1000 positions.

I am afraid that, slowly or not, you need something like the first query.

borjab
+1  A: 

A faster alternative:

SELECT * FROM emp SAMPLE(10);

or

SELECT * FROM emp SAMPLE(5) BLOCKS;

Read here: http://oracleact.com/papers/sampleclause.html

EDIT1: After rereading, this is already mentioned (more or less). However I can't delete this answer.

Theo
I think it's worth having this answer here in case someone reads this question and just copies the query without looking further.
Jeffrey Kemp