views:

118

answers:

4

I want to get some random records from db. There is two solution for this :

1- Using TABLESAMPLE for getting data from db directly.

2- Write a method In my application for doing this. In this method we generate multiple random number and get data like this :

select * from db where ID = @RandomNumber

if this ID does not exist, I pass a new number.

Now which one has better performance?

+1  A: 

I would use TABLESAMPLE, as its makes it very easy to generate sample data. I expect it would be more efficient as you only call one piece of SQL.

e.g.

USE AdventureWorks ;
GO
SELECT FirstName, LastName
FROM Person.Contact 
TABLESAMPLE (10 PERCENT) 

In your other example, you will have to keep on calling select * from db where ID = @RandomNumber many times.

If you after individual rows then i would use another method, some form of random TOP 1 etc...

kevchadders
+3  A: 

According to the documentation for TABESAMPLE you shouldn't use it if you "really want a sample of individual rows":

If you really want a random sample of individual rows, modify your query to filter out rows randomly, instead of using TABLESAMPLE. For example, the following query uses the NEWID function to return approximately one percent of the rows of the Sales.SalesOrderDetail table:

 SELECT  * 
 FROM    Sales.SalesOrderDetail
 WHERE   0.01 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float)
         / CAST (0x7fffffff AS int)

The SalesOrderID column is included in the CHECKSUM expression so that NEWID() evaluates once per row to achieve sampling on a per-row basis. The expression CAST(CHECKSUM(NEWID(),> SalesOrderID) & 0x7fffffff AS float / CAST(0x7fffffff AS int) evaluates to a random float value between 0 and 1.

Either way, given the potentially endless number of requests you could make by passing in @RandomNumber (in theory the first 1000 requests you make might return nothing), the better approach is to limit the resultset on the server.

Rob
+2  A: 

try this:

SELECT TOP 1 * FROM db
ORDER BY NEWID()

the NewID function will generate UniqueIdentifier value and it will be random. Source: SQL to Select a random row from a database table

Wael Dalloul
The question was "which method gives a better performance", so I would not recommend ORDER BY NEWID(). It requires assigning NEWID() to all rows, then sorting them all. This is expensive for large recordsets!
filiprem
A: 

I recommend to read a post about various methods to get random row from table. It's based on PostgreSQL, but I'm sure that 90% applies to SQL Server too.

Of course most flexible and best performing solution can be achieved by writing a stored procedure.

Cost (hence: best performance) of getting truly random sample depends on data (type of data, statistics and distribution, including sparseness).

filiprem