ansaurus

Question

Which solution is better for getting random record set from db?

Answer 1

+1 A:

I would use TABLESAMPLE, as its makes it very easy to generate sample data. I expect it would be more efficient as you only call one piece of SQL.

e.g.

USE AdventureWorks ;
GO
SELECT FirstName, LastName
FROM Person.Contact 
TABLESAMPLE (10 PERCENT)

In your other example, you will have to keep on calling select * from db where ID = @RandomNumber many times.

If you after individual rows then i would use another method, some form of random TOP 1 etc...

kevchadders 2010-03-02 09:08:49

Answer 2

+3 A:

According to the documentation for TABESAMPLE you shouldn't use it if you "really want a sample of individual rows":

If you really want a random sample of individual rows, modify your query to filter out rows randomly, instead of using TABLESAMPLE. For example, the following query uses the NEWID function to return approximately one percent of the rows of the Sales.SalesOrderDetail table:
 SELECT  * 
 FROM    Sales.SalesOrderDetail
 WHERE   0.01 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float)
         / CAST (0x7fffffff AS int)
The SalesOrderID column is included in the CHECKSUM expression so that NEWID() evaluates once per row to achieve sampling on a per-row basis. The expression CAST(CHECKSUM(NEWID(),> SalesOrderID) & 0x7fffffff AS float / CAST(0x7fffffff AS int) evaluates to a random float value between 0 and 1.

Either way, given the potentially endless number of requests you could make by passing in @RandomNumber (in theory the first 1000 requests you make might return nothing), the better approach is to limit the resultset on the server.

Rob 2010-03-02 09:10:03

Answer 3

+2 A:

try this:

SELECT TOP 1 * FROM db
ORDER BY NEWID()

the NewID function will generate UniqueIdentifier value and it will be random. Source: SQL to Select a random row from a database table

Wael Dalloul 2010-03-02 09:10:26

The question was "which method gives a better performance", so I would not recommend ORDER BY NEWID(). It requires assigning NEWID() to all rows, then sorting them all. This is expensive for large recordsets!

filiprem 2010-03-02 09:22:19

Answer 4

A:

I recommend to read a post about various methods to get random row from table. It's based on PostgreSQL, but I'm sure that 90% applies to SQL Server too.

Of course most flexible and best performing solution can be achieved by writing a stored procedure.

Cost (hence: best performance) of getting truly random sample depends on data (type of data, statistics and distribution, including sparseness).

filiprem 2010-03-02 10:09:12

ansaurus

tags:

views:

answers:

Which solution is better for getting random record set from db?

related questions