ansaurus

Question

Is this a good or bad way of generating random numbers for each record?

Answer 1

A:

If I had to select a random number for each row in SQL, and you could prove to me that RAND() is generating true random numbers...

Yes. I would probably use something like that.

Justin Niessner 2009-09-16 16:09:48

Answer 2

+1 A:

It depends on what you need the random value for. It also depends on the format that you need the value in INTEGER, VARCHAR, etc.

if I need to sort rows randomly, I do something like

SELECT *
FROM [MyTable]
ORDER BY newID()

Likewise, you could generate a table of ints using the identity "feature" of SQL Server and perform a similar query and that could give you a random number.

My colleague needed a random integer per row, so he added a calculated field to our table and that generates one random number (integer) per row returned in a query. I'm not sure I recommend this; it caused issues in certain tools but it gave random integers for each table. We could then combine my solution of newid() and that table and get a set of random numbers when needed.

So I return to it depends. Can you elaborate on what you need it?

Update: Here is the table definition snippet my colleague used to have a computed column return a different random number per row, each time the table is queried:

CREATE TABLE [dbo].[Table](
    -- ...
    [OrderID] [smallint] NOT NULL,  --Not sure what happens if this is null
    -- ...
    [RandomizeID]  AS (convert(int,(1000 * rand(([OrderID] * 100 * datepart(millisecond,getdate())))))),
    -- ...
)

Frank V 2009-09-16 16:12:34

At present it's quite academic, just a case of obtaining rows from a record set in a random way. In that different records are needed each time. Possibly weighted, but using [weight]*dbo.RandNumber() gives that. So simply put, a way to get a randomly generated row for each record, which is different each time you query the table.

Dems 2009-09-16 16:18:32

I didn't specify sql-2000 compatible, but also (afaik) newID() doesn't return a random number as such. It is neither a number (used for multiplying a weight by, for example) not truely random as it is based on the time, hardware, etc. But then, I don't know if that is any less random than the RAND() function.

Dems 2009-09-16 16:24:05

What is the calculation your colleague puts in the calculated field? I just tried using RAND() and got a different value on each execution, but the same value for every record...

Dems 2009-09-16 16:34:22

Answer 3

A:

I wouldn't use this. As far as I know, RAND() uses the system time as seed and produces the same values when executed more than once quickly after each other. For example, try this:

SELECT    *, 
          RAND()
FROM      SomeTable

RAND() will give you the same value for each row.

Maximilian Mayerl 2009-09-16 16:15:43

That behaviour is not due to the proximity of the times. It is because RAND() is executed once only and not once per record. This example also uses RAND() but obfuscates it behind both a UDF and a VIEW. Thus forcing it to recalculate every time. In my example all three records get different values, every time. How random are they, I'm not sure. But they're certainly not going to be the same (except by chance).

Dems 2009-09-16 16:26:39

My query is not really about the merit of RAND(), but the merit of using the UDF/VIEW combo to force a reclaculation for every row.

Dems 2009-09-16 16:27:40

Oh, I see. Thank's for the info on RAND() executing only once per record set, I didn't know this. Also, sorry for misunderstanding your question.

Maximilian Mayerl 2009-09-16 16:29:38

Answer 4

+2 A:

I would not do this for a piece of software I wanted to continue working on future versions of SQL Server. I found a way to return a different values from RAND() for each row in a select statement. This discovery was 1) a bit of a hack and 2) was made on SQL Server 2005. It no longer works on SQL Server 2008. That experience makes me extra leary of relying on trickery to get rand() to return a random value per row.

Also, I believe SQL Server is allowed to optimize away the multiple calls to a UDF ... though that might be changing since they do allow some non-deterministic functions now.

For SQL Server 2005 only, a way to force rand() to execute per row in a select statement. Does not work on SQL Server 2008. Not tested on any version prior to 2005:

create table #t (i int)
insert into #t values (1)
insert into #t values (2)
insert into #t values (3)

select i, case when i = 1 then rand() else rand() end as r
from #t

1   0.84923391682467
2   0.0482397143838935
3   0.939738172108974

Also, I know you said you were not asking about the randomness of rand(), but I will a good reference is: http://msdn.microsoft.com/en-us/library/aa175776%28SQL.80%29.aspx. It compares rand() to newid() and rand(FunctionOf(PK, current datetime)).

Shannon Severance 2009-09-16 16:57:18

CHECKSUM(NEWID()) at least works on SQL 2000+. This relies on a certain behaviour that may be removed in a SQL 2005 patch

gbn 2009-12-17 20:59:41

Answer 5

A:

The view and udf approach is clumsy for me: excess trivial objects to use a flawed function.

I'd use CHECKSUM(NEWID()) to generate a random number (rather than RAND() * xxx), or the new SQL Server 2008 CRYPT_GEN_RANDOM

gbn 2009-11-28 14:20:07

Wouldn't NEWID() still resolve to a constanst in the same way as RAND()? So the view/udf combination still be required? (It's the view/udf combination that's intrinsically in question, allowing what would normally be considered a constant expression to be re-evaluated for each record.)

Dems 2009-11-28 22:02:55

NEWID() is *per call*, not per statement. So it will be different per row.

gbn 2009-11-29 09:21:42

Common answer of mine: http://stackoverflow.com/search?q=newid+RAND+user%3A27535

gbn 2009-11-29 09:24:22

ansaurus

tags:

views:

answers:

Is this a good or bad way of generating random numbers for each record?

related questions