views:

748

answers:

6

I'm trying to randomly insert values from a list of pre-defined values into a table for testing. I tried using the solution found on this StackOverflow question:

stackoverflow.com/.../update-sql-table-with-random-value-from-other-table

When I I tried this, all of my "random" values that are inserted are exactly the same for all 3000 records.

When I run the part of the query that actually selects the random row, it does select a random record every time I run it by hand, so I know the query works. My best guesses as to what is happening are:

  • SQL Server is optimizing the SELECT somehow, not allowing the subquery to be evaluated more than once
  • The random value's seed is the same on every record the query updates

I'm stuck on what my options are. Am I doing something wrong, or is there another way I should be doing this?

This is the code I'm using:

DECLARE @randomStuff TABLE ([id] INT, [val] VARCHAR(100))

INSERT INTO @randomStuff ([id], [val]) 
VALUES ( 1,  'Test Value 1' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 2,  'Test Value 2' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 3,  'Test Value 3' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 4,  'Test Value 4' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 5,  'Test Value 5' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 6,  null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 7,  null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 8,  null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 9,  null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 10, null )

UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID())
+1  A: 

I don't have time to check this right now, but my gut tells me that if you were to create a function on the server to get the random value that it would not optimize it out.

then you would have

UPDATE MyTable
Set MyColumn = dbo.RANDOM_VALUE()
tster
A: 

There is no optimization going on here.

Your using a subquery that selects a single value, there is nothing to optimize.

You can also try putting a column from the table your updating in the select and see if that changes anything. That may trigger an evaluation for every row in MyTable

UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID()
    WHERE MyTable.MyColumn = MyTable.MyColumn )
jfar
I just tried that. Had no luck changing the results.
Dan Herbert
+2  A: 

use a cross join to generate random data

Do you have an example I could use? I'm not familiar with the idea of cross joins.
Dan Herbert
A: 

I came up with a solution which is a bit of a hack and very inefficient (10~ seconds to update 3000 records). Because this is being used to generate test data, I don't have to be concerned about speed however.

In this solution, I iterate over every row in the table and update the values one row at a time. It seems to work:

DECLARE @rows INT 
DECLARE @currentRow INT

SELECT @rows = COUNT(*) FROM dbo.MyTable
SET @currentRow = 1

WHILE @currentRow < @rows
BEGIN 

UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID())
WHERE MyPrimaryKey = (SELECT b.MyPrimaryKey
 FROM(SELECT a.MyPrimaryKey, ROW_NUMBER() OVER (ORDER BY MyPrimaryKey) AS rownumber
      FROM MyTable a) AS b
 WHERE @currentRow = b.rownumber
)

SET @currentRow = @currentRow + 1
END
Dan Herbert
+1  A: 

I've had a play with this, and found a rather hacky way to do it with the use of an intermediate table variable.

Once @randomStuff is set up, we do this (note in my case, @MyTable is a table variable, adjust accordingly for your normal table):

DECLARE @randomMappings TABLE (id INT, val VARCHAR(100), sorter UNIQUEIDENTIFIER)

INSERT INTO @randomMappings 
SELECT M.id, val, NEWID() AS sort 
FROM @MyTable AS M 
CROSS JOIN @randomstuff

so at this point, we have an intermediate table with every combination of (mytable id, random value), and a random sort value for each row specific to that combination. Then

DELETE others FROM @randomMappings AS others 
INNER JOIN @randomMappings AS lower 
ON (lower.id = others.id) AND (lower.sorter < others.sorter)

This is an old trick which deletes all rows for a given MyTable.id except for the one with the lower sort value -- join the table to itself where the value is smaller, and delete any where such a join succeeded. This just leaves behind the lowest value. So for each MyTable.id, we just have one (random) value left.. Then we just plug it back into the table:

UPDATE @MyTable
SET MyColumn = random.val
FROM @MyTable m, @randomMappings AS random
WHERE (random.id = m.id)

And you're done!

I said it was hacky...

Cowan
+4  A: 

When the query engine sees this...

(SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID())

... it's all like, "ooooh, a cachable scalar subquery, I'm gonna cache that!"

You need to trick the query engine into thinking it's non-cachable. jfar's answer was close, but the query engine was smart enough to see the tautalogy of MyTable.MyColumn = MyTable.MyColumn, but it ain't smart enough to see through this.

UPDATE MyTable
   SET MyColumn = (SELECT TOP 1 val
                     FROM @randomStuff r
                          INNER JOIN MyTable _MT
                                  ON MT.Id = _MT.Id
                    ORDER BY NEWID())
 FROM MyTable M

By bringing in the outer table (MT) into the subquery, the query engine assumes subquery will need to be re-evaluated. Anything will work really, but I went with the (assumed) primary key of MyTable.Id since it'd be indexed and would add very little overhead.

A cursor would probably be just as fast, but is most certainly not as fun.

Alex Papadimoulis
OK, I can't remember if you can do the INNER JOIN like that in SQL Server 2000, but there is a way around it, which I used all the time before going 2005. That was too many years ago for me to remember, though. But this should work in 2005 and later just fine.
Alex Papadimoulis
That worked excellently. Thank you!
Dan Herbert