views:

306

answers:

5

A colleague of mine discovered a behaviour in SQL Server which I was unaware of.

CREATE VIEW dbo.vRandNumber AS
SELECT RAND() as RandNumber
GO

CREATE FUNCTION dbo.RandNumber() RETURNS float AS
RETURN (SELECT RandNumber FROM vRandNumber)
GO

DECLARE @mytable TABLE (id INT)
INSERT INTO @mytable SELECT 1
INSERT INTO @mytable SELECT 2
INSERT INTO @mytable SELECT 3

SELECT *, dbo.RandNumber() FROM @mytable

This seems to be the quickest way of generating a 'random' value for each record in a data set. But I'm not completely sure if it's a result of documented behaviour, or taking advantage of a bizarre convergance of coincidences.

Would you use something like this?


EDIT

This isn't a question about the merits of the RAND() function itself, but the use of the UDF/VIEW combination to force it to recalculate on every row. (Using just RAND() in the final query, instead of dbo.RandNumber(), would give the same value for every record.)

Also, the point is for the value to be different every time you look at it. So enabling random selection of records, for example.

EDIT

For SQL Server 2000+.

A: 

If I had to select a random number for each row in SQL, and you could prove to me that RAND() is generating true random numbers...

Yes. I would probably use something like that.

Justin Niessner
+1  A: 

It depends on what you need the random value for. It also depends on the format that you need the value in INTEGER, VARCHAR, etc.

if I need to sort rows randomly, I do something like

SELECT *
FROM [MyTable]
ORDER BY newID()

Likewise, you could generate a table of ints using the identity "feature" of SQL Server and perform a similar query and that could give you a random number.

My colleague needed a random integer per row, so he added a calculated field to our table and that generates one random number (integer) per row returned in a query. I'm not sure I recommend this; it caused issues in certain tools but it gave random integers for each table. We could then combine my solution of newid() and that table and get a set of random numbers when needed.

So I return to it depends. Can you elaborate on what you need it?

Update: Here is the table definition snippet my colleague used to have a computed column return a different random number per row, each time the table is queried:

CREATE TABLE [dbo].[Table](
    -- ...
    [OrderID] [smallint] NOT NULL,  --Not sure what happens if this is null
    -- ...
    [RandomizeID]  AS (convert(int,(1000 * rand(([OrderID] * 100 * datepart(millisecond,getdate())))))),
    -- ...
)
Frank V
At present it's quite academic, just a case of obtaining rows from a record set in a random way. In that different records are needed each time. Possibly weighted, but using [weight]*dbo.RandNumber() gives that. So simply put, a way to get a randomly generated row for each record, which is different each time you query the table.
Dems
I didn't specify sql-2000 compatible, but also (afaik) newID() doesn't return a random number as such. It is neither a number (used for multiplying a weight by, for example) not truely random as it is based on the time, hardware, etc. But then, I don't know if that is any less random than the RAND() function.
Dems
What is the calculation your colleague puts in the calculated field? I just tried using RAND() and got a different value on each execution, but the same value for every record...
Dems
A: 

I wouldn't use this. As far as I know, RAND() uses the system time as seed and produces the same values when executed more than once quickly after each other. For example, try this:

SELECT    *, 
          RAND()
FROM      SomeTable

RAND() will give you the same value for each row.

Maximilian Mayerl
That behaviour is not due to the proximity of the times. It is because RAND() is executed once only and not once per record. This example also uses RAND() but obfuscates it behind both a UDF and a VIEW. Thus forcing it to recalculate every time. In my example all three records get different values, every time. How random are they, I'm not sure. But they're certainly not going to be the same (except by chance).
Dems
My query is not really about the merit of RAND(), but the merit of using the UDF/VIEW combo to force a reclaculation for every row.
Dems
Oh, I see. Thank's for the info on RAND() executing only once per record set, I didn't know this. Also, sorry for misunderstanding your question.
Maximilian Mayerl
+2  A: 

I would not do this for a piece of software I wanted to continue working on future versions of SQL Server. I found a way to return a different values from RAND() for each row in a select statement. This discovery was 1) a bit of a hack and 2) was made on SQL Server 2005. It no longer works on SQL Server 2008. That experience makes me extra leary of relying on trickery to get rand() to return a random value per row.

Also, I believe SQL Server is allowed to optimize away the multiple calls to a UDF ... though that might be changing since they do allow some non-deterministic functions now.

For SQL Server 2005 only, a way to force rand() to execute per row in a select statement. Does not work on SQL Server 2008. Not tested on any version prior to 2005:

create table #t (i int)
insert into #t values (1)
insert into #t values (2)
insert into #t values (3)

select i, case when i = 1 then rand() else rand() end as r
from #t

1   0.84923391682467
2   0.0482397143838935
3   0.939738172108974

Also, I know you said you were not asking about the randomness of rand(), but I will a good reference is: http://msdn.microsoft.com/en-us/library/aa175776%28SQL.80%29.aspx. It compares rand() to newid() and rand(FunctionOf(PK, current datetime)).

Shannon Severance
CHECKSUM(NEWID()) at least works on SQL 2000+. This relies on a certain behaviour that may be removed in a SQL 2005 patch
gbn
A: 

The view and udf approach is clumsy for me: excess trivial objects to use a flawed function.

I'd use CHECKSUM(NEWID()) to generate a random number (rather than RAND() * xxx), or the new SQL Server 2008 CRYPT_GEN_RANDOM

gbn
Wouldn't NEWID() still resolve to a constanst in the same way as RAND()? So the view/udf combination still be required? (It's the view/udf combination that's intrinsically in question, allowing what would normally be considered a constant expression to be re-evaluated for each record.)
Dems
NEWID() is *per call*, not per statement. So it will be different per row.
gbn
Common answer of mine: http://stackoverflow.com/search?q=newid+RAND+user%3A27535
gbn