ansaurus

Question

Select random sampling from sqlserver quickly

Answer 1

+3 A:

Yeah, tablesample is your friend (note that it's not random in the statistical sense of the word): Tablesample at msdn

friism 2009-03-16 20:40:14

We are using sqlserver 2005, but our database compatibility level is at 80, so no tablesample. :( any other ideas?

Byron Whitlock 2009-03-16 20:45:54

select * from customers order by newid()

Albert 2009-03-20 12:21:23

Answer 2

+5 A:

Hi,

Have you looked into using the TABLESAMPLE clause?

For example:

select *
from HumanResources.Department tablesample (5 percent)

John Sansom 2009-03-16 20:42:18

Answer 3

A:

If you can use a pseudo-random sampling and you're on SQL Server 2005/2008, then take a look at TABLESAMPLE. For instance, an example from SQL Server 2008 / AdventureWorks 2008 which works based on rows:

USE AdventureWorks2008; 
GO 


SELECT FirstName, LastName
FROM Person.Person 
TABLESAMPLE (100 ROWS)
WHERE EmailPromotion = 2;

The catch is that TABLESAMPLE isn't exactly random as it generates a given number of rows from each physical page. You may not get back exactly 5000 rows unless you limit with TOP as well. If you're on SQL Server 2000, you're going to have to either generate a temporary table which match the primary key or you're going to have to do it using a method using NEWID().

K. Brian Kelley 2009-03-16 20:46:16

Wrong, tablesample works by selecting an appropriate number of pages and then returning all the rows found on those pages. The whole point is avoiding hitting all the pages holding the table.

friism 2009-03-20 16:45:40

Sorry, you are right. Read the algorithm wrong. It determines the # of rows and then selects the entire page or not to get the approxmate #.

K. Brian Kelley 2009-03-20 21:27:28

ansaurus

tags:

views:

answers:

Select random sampling from sqlserver quickly

related questions