ansaurus

Question

Answer 1

A:

RandNumber is a function. Functions in SQL must output the same result every time for the same input, unless underlying database data has changed. This is the mathematical definition of a function (as opposed to how a normal programming language treats a "function," which is more of a function-like construct).

Since the result of your function should never change during an update statement (which is an atomic operation) the query plan compiler for your query only calls RandNumber once and then caches the result.

You may be able to get away with just referencing RAND directly in your query, but if that still doesn't work, you'll have to do this iteratively in a stored procedure.

David Pfeffer 2010-03-04 15:41:35

OK, but I've heard that when you do a getdate() as part of a where in a select that getdate() will run once for every row in the select. Is that not correct?

jcollum 2010-03-04 15:44:03

In MS-SQL Functions can be deterministic (as you described) or non-deterministic see http://msdn.microsoft.com/en-us/library/aa214775%28SQL.80%29.aspx the RAND function is non deterministic.

David Waters 2010-03-04 15:49:01

In this case, nearly every function will be run deterministically because `update` is atomic. Therefore, even if a normally non-deterministic function is called, the schema-bound nature of it is negated by the fact that the underlying data cannot change during the query execution. I think this **has** to be done iteratively unless you can force the query plan compiler to treat the function as non-deterministic despite the fact that it won't want to do so.

David Pfeffer 2010-03-04 15:52:45

@David: excellent explanation, thanks.

jcollum 2010-03-04 16:24:40

Answer 2

A:

How about

update names set nickname = null
where  abs(checksum(nickname) % 2) = 0

Hogan 2010-03-04 15:42:58

Nope, I ran the update with your where and it updated all the rows every time I ran it.

jcollum 2010-03-04 15:47:47

@jcollumn, how does the odd/even one do?

Hogan 2010-03-04 15:48:55

@jcollumn - ok I tested this one, it will work... warning null name or "constant" names will all do the same thing so it is not totally random.

Hogan 2010-03-04 15:53:57

actually that works well because the nickname column is being randomized somewhere else, using C# code so it'll be more random

jcollum 2010-03-04 16:02:51

yes I saw that in your question -- as @Quassnoi points out you can use newid() to get a random string and checksum that, but given your dataset this is faster.

Hogan 2010-03-04 16:04:26

Answer 3

A:

try something like this:

WHERE DATEPART(ms,CreateDate)>500

where "CreateDate" is a column already in the table that has an actual date and time in it. Tthe millisecond should be fairly random

EDIT here's another approach:

DECLARE @YourTable table (RowID int, RowValue varchar(5))
INSERT INTO @YourTable VALUES (1,'one')
INSERT INTO @YourTable VALUES (2,'two')
INSERT INTO @YourTable VALUES (3,'three')

SELECT 
    RAND(row_number() over(order by RowID)+DATEPART(ms,GETDATE())),* 
    FROM @YourTable

OUTPUT run 1:

                       RowID       RowValue
---------------------- ----------- --------
0.716200609189072      1           one
0.71621924216033       2           two
0.716237875131588      3           three

(3 row(s) affected)

OUTPUT run 2:

                       RowID       RowValue
---------------------- ----------- --------
0.727007732518828      1           one
0.727026365490086      2           two
0.727044998461344      3           three

(3 row(s) affected)

KM 2010-03-04 15:50:04

Answer 4

+4 A:

Assuming your Names table has a primary key field called Id, this will nullify the nickname in a random 50 percent of the rows:

update dbo.Names set Nickname = null where Id in
(
    select top 50 percent id from dbo.Names order by NEWID()
)

Ian Nelson 2010-03-04 15:51:09

This works but is inefficient -- generating a new GUID for each row will take a while longer than generating a single integer for the row.

David Pfeffer 2010-03-04 15:55:07

Pfft, c'mon! Is super performance really an issue for this kind of use case? Talk about premature optimization.I've just tried this approach on a table with 182,770 rows and it ran in 11 seconds.

Ian Nelson 2010-03-04 16:04:25

Well, it's close. But it will always update the same number of rows. I think I need a loop to update a random number of rows.

jcollum 2010-03-04 16:08:10

Yeah, the performance is fine, I got 655k rows in 25s. This isn't the kind of thing that will be in prod code.

jcollum 2010-03-04 16:09:14

OK, so add a line or two beforehand to randomly determine how many rows to update, between some range, and use that absolute figure instead of the "50 percent".

Ian Nelson 2010-03-04 16:10:09

I still think its the best solution and I still upvoted it. I'm just pointing out there's probably a better way.

David Pfeffer 2010-03-04 16:40:06

I don't think it's entirely what I want, but it does the job. I tried doing this the right way (getting all the ids, then wiping data in each row based on a coin toss, one at a time) and the performance was awful.

jcollum 2010-03-04 18:31:48

Answer 5

A:

RAND() (and GetDate/CURRENT_TIMESTAMP) is evaluated once per statement. You need someway to get around that. One way is (if you have a convenient row valued integer, e.g. an ID column), is to call RAND(ID) instead.

Damien_The_Unbeliever 2010-03-04 15:51:09

Still only called once.

Hogan 2010-03-04 15:57:54

Um, no. RAND(ID), where ID varies on a per-row basis, is called once per row.

Damien_The_Unbeliever 2010-03-04 18:50:40

Answer 6

A:

RAND() persists within a query.

SELECT  RAND()
FROM    names

will give you a set of equal numbers.

You need to do something like this:

WITH    q AS
        (
        SELECT  *,
                ABS(CHECKSUM(NEWID())) % 2 AS r
        FROM    names
        )
UPDATE  q
SET     nickname = NULL
WHERE   r = 0

Quassnoi 2010-03-04 15:51:26

ansaurus

tags:

views:

answers:

how can I update rows at random?

related questions