views:

423

answers:

3

I was looking at postgres question and it called for updating a table with random values.

I only noticed it was targeted at postgres after answering it, but in answering it I hit a mystery.

Here is my sample code in question:

create table #Buildings([Use] varchar(50), n int)

insert #Buildings
select null,null from sysobjects 

update #Buildings
set [Use] = 
     case (ABS(CAST(CAST(NEWID() AS VARBINARY) AS int)) % 6)
      when 0 then null
      when 1 then 'warehouse'
      when 2 then 'office'
      when 3 then 'market'
      when 4 then 'retail'
      when 5 then 'workshop'
      else 'HOW IS THIS POSSIBLE'
     end,
    n = ABS(CAST(CAST(NEWID() AS VARBINARY) AS int)) % 6

select [Use], count(*) from
#Buildings
group by [Use] 

select n, count(*) from
#Buildings
group by n

It returns a very odd result set:

Use                                                
-------------------------------------------------- -----------
workshop                                           128
HOW IS THIS POSSIBLE                               633
NULL                                               287
retail                                             140
warehouse                                          258
market                                             177
office                                             209

And a second result set that makes complete sense:

n           
----------- -----------
0           292
3           300
1           313
4           277
5           311
2           339

The data for both result sets was generated in the same update statement.

So my question is why is a number out of the range of 0 - 5 hitting my case statement? What is that number? Why when I update the int directly is stuff distributed properly?

+5  A: 

I have an idea:

maybe the value is being computed on every when statement, if it gets to the first one, computes a value other than 0 it goes to the second, it then computes a value other than 1, then the next and so on, if it makes it's way to 5 and then computes a number other than 5 it goes to else.

this would also explain why there are so many more "HOW IS THIS POSSIBLE" than the others, the others are incremented when the current position and the random number generated at that position match.

your thoughts?

edit: also i've tested in sql server express 2008 and it's the same outcome.

John Boker
I think you are on to something, that makes sense and correlates with the results
Sam Saffron
this bothered me for about 20 minutes, had to open ssms and everything to see if i could find a good answer.
John Boker
Interesting suggestion!
Alexander Prokofyev
@John Boker: interesting observation. Seems a bug if expressions are evaluated on-the-fly for each WHEN
Michael Buen
A: 

John is likely right... though it's worrying the "WHEN" would be so inefficient.

Also note that the NewId() function returns a new value each time you call it. You're calling it twice (not counting the "when" wierdness). The following is more stable: drop table #Buildings go

create table #Buildings([Use] varchar(50), n int)

insert #Buildings select null,null from sysobjects

declare @nid as uniqueidentifier

update #Buildings set @nid = newid(), [Use] = case (ABS(CAST(CAST(NewId() AS VARBINARY) AS int)) % 6) when 0 then null when 1 then 'warehouse' when 2 then 'office' when 3 then 'market' when 4 then 'retail' when 5 then 'workshop' else 'HOW IS THIS POSSIBLE' end, n = ABS(CAST(CAST(NewId() AS VARBINARY) AS int)) % 6

select [Use], count(*) from

Buildings

group by [Use]

select n, count(*) from

Buildings

group by n

+1  A: 

create table #Buildings([Use] varchar(50), n int)
insert #Buildings select null,null from sysobjects

declare @i int

update #Buildings
set 
    @i = (ABS(CAST(CAST(NEWID() AS VARBINARY) AS int)) % 6),
    [Use] = 
        case @i -- if the expression is evaluated on-the-fly for each WHEN, MSSQL variable-in-SQL capability will alleviate this problem
                when 0 then null
                when 1 then 'warehouse'
                when 2 then 'office'
                when 3 then 'market'
                when 4 then 'retail'
                when 5 then 'workshop'
                else 'HOW IS THIS POSSIBLE'
        end,
    n = @i

select [Use], count(*) from
#Buildings
group by [Use] 

select n, count(*) from
#Buildings
group by n
Michael Buen
yerp this is a good workaround for the problem
Sam Saffron