ansaurus

Question

Best way to randomly select rows *per* column in SQL Server

Answer 1

+1 A:

You could select the top N random rows at once (where N=3 is the number of columns), and then take column 1 from row 1, column 2 from row 2, etc. I'm not sure exactly how to do that last step in SQL, but if you're willing to do the last step in some other language I'm sure it would be simple.

Also, by selecting N rows at once you would have the new property that you would never be selecting two columns from the same row (though this could cause trouble if there are more columns than rows).

mathmike 2010-04-28 17:01:06

+1 I like the additional assurance that no real First/Last names would be paired with this technique.

LesterDove 2010-04-28 18:22:18

Answer 2

+3 A:

ORDER BY NEWID() works with ROW_NUMBER in SQL Server 2008. Not sure about SQL Server 2005,

This is needed to generate values to join the 3 separate queries: it's slightly counter intuitive because you'd think it would always take the first 100 rows in a different order but it doesn't...

;With F AS
(
  SELECT TOP 100
    FirstName, ROW_NUMBER() OVER (ORDER BY NEWID()) AS Foo
  FROM Customer
), L AS
(
  SELECT TOP 100
    LastName, ROW_NUMBER() OVER (ORDER BY NEWID()) AS Foo
  FROM Customer
), S AS
(
  SELECT TOP 100
    State, ROW_NUMBER() OVER (ORDER BY NEWID()) AS Foo
  FROM Customer
)
SELECT
   F.FirstName, L.LastName, S.State
FROM
   F
   JOIN L ON F.Foo = L.Foo
   JOIN S ON F.Foo = S.Foo

gbn 2010-04-28 17:04:14

+1 Very nice use of NEWID and ROW_NUMBER

Meff 2010-04-29 12:05:18

Something new; thanks. And yes it worked in 2005

LesterDove 2010-04-29 13:42:29

Answer 3

A:

It seems to me that you are actually trying to generate random data -- the fact that you already have a bunch that is non-random is really just a side note. If I were in your shoes, I would look at generating random customers by choosing random words from the dictionary to use as FName, LName, City, etc. That seems easier and more random anyway.

MJB 2010-04-28 17:06:56

Thank you - that would indeed take care of the 'randonmess' issue. It's for a sample report, so I don't think a customer named Mr. Pogostick Zucchini would be appropriate. I suppose what I want is randomized customer data, to put it a better way. This is a deficiency in my original post, sorry.

LesterDove 2010-04-28 18:15:34

Gotcha. Now I understand.

MJB 2010-04-28 19:45:23

ansaurus

tags:

views:

answers:

Best way to randomly select rows per column in SQL Server

related questions