views:

51

answers:

3

I need to select sample rows from a set. For example if my select query returns x rows then if x is greater than 50 , I want only 50 rows returned but not just the top 50 but 50 that are evenly spread out over the resultset. The table in this case records routes - GPS locations + DateTime. I am ordering on DateTime and need a reasonable sample of the Latitude & Longitude values. Thanks in advance [ SQL Server 2008 ]

+1  A: 

I suggest that you add a calculated column to your resultset on selection that is obtained as a random number, and then select the top 50 sorted by that column. That will give you a random sample.

For example:

SELECT TOP 50 *, RAND(Id) AS Random
FROM SourceData
ORDER BY Random

where SourceData is your source data table or view. This assumes T-SQL on SQL Server 2008, by the way. It also assumes that you have an Id column with unique ids on your data source. If your ids are very low numbers, it is a good practice to multiply them by a large integer before passing them to RAND, like this:

RAND(Id * 10000000)
CesarGon
+1  A: 

To get sample rows in SQL Server, use this query:

SELECT TOP 50 * FROM Table
ORDER BY NEWID();

If you want to get every n-th row (10th, in this example), try this query:

SELECT * From 
(
  SELECT  *, (Dense_Rank() OVER (ORDER BY Column ASC)) AS Rank 
  FROM Table
) AS Ranking 
WHERE Rank % 10 = 0; 

Source

More examples of queries selecting random rows for other popular RDBMS can be found here: http://www.petefreitag.com/item/466.cfm

jimmy_keen
i think it's MS SQL Server
Srinivas Reddy Thatiparthy
+1  A: 

Every n'th row to get 50:

SELECT *
FROM table
WHERE row_number() over() MOD (SELECT Count(*) FROM table) / 50 == 0
FETCH FIRST 50 ROWS ONLY

And if you want a random sample, go with jimmy_keen's answer.

UPDATE: In regard to the requirement for it to run on MS SQL, I think it should be changed to this (no MS SQL Server around to test though):

  SELECT TOP 50 *
   FROM (
     SELECT t.*, row_number() over() AS rn, (SELECT count(*) FROM table) / 50 AS step
     FROM table t
   )
   WHERE rn % step == 0
inflagranti
Wow Thanks inflagranti this is the business. Thanks to the others as well.
Pierce
That code does not run on SQL Server 2008.
CesarGon
It should be standard SQL.@Pierce: Didn't you have just the SQL-tag in the begining?
inflagranti
yes .. edited it later in response to request
Pierce

related questions