views:

100

answers:

6

The intention of following (simplified) code fragment is to return one random row. Unfortunatly, when we run this fragment in the query analyzer, it returns between zero and three results.

As our input table consists of exactly 5 rows with unique ID's and as we perform a select on this table where ID equals a random number, we are stumped that there would ever be more than one row returned.

Note: among other things, we already tried casting the checksum result to an integer with no avail.

DECLARE @Table TABLE (
  ID INTEGER IDENTITY (1, 1)
  , FK1 INTEGER
)

INSERT INTO @Table
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5

SELECT  *
FROM    @Table 
WHERE   ID = ABS(CHECKSUM(NEWID())) % 5 + 1

Edit

Our usage scenario is as follows (please don't comment on wether it is the right thing to do or not. It's the powers that be that have decided)

Ultimately, we must create a result with realistic values where the combination of producer and weights are obfuscated by selecting at random existing weights from the table itself.
The query then would become something like this (also a reason why RAND can not be used)

SELECT  t.ID
        , FK1 = (SELECT FK1 FROM @Table WHERE ID=ABS(CHECKSUM(NEWID())) % 5 + 1)
FROM    @Table t

Because the inner select could be returning zero results, it would return a NULL value wich again is not acceptable. It is the investigation of why the inner select returns between zero and x results, that this question sproused (is this even English?).

Answer

What turned the light on for me was the simple observation that ABS(CHECKSUM(NEWID())) % 5 + 1) was re-evaluated for each row. I was under the impression that ABS(CHECKSUM(NEWID())) % 5 + 1) would get evaluated once, then matched.

Thank you all for answering and slowly but surely leading me to a better understanding.

A: 

Try the following, so you can see what happens:

SELECT  ABS(CHECKSUM(NEWID())) % 5 + 1 AS Number, @Table.*
FROM    @Table 
WHERE   ID = Number
Sjoerd
@Sjoerd, the query you've presented does not run as it stands but it gave us something new to try.
Lieven
+2  A: 

A bit of a guess, and not sure that SQL works this way, but wouldn't SQL evaluate "ABS(CHECKSUM(NEWID())) % 5 + 1" for each row in the table? If so, then each evaluation may or may not return the value of ID of the current row.

Try this instead - generating the random number explicitly first, and matching on that single value:

declare @targetRandom int
set @targetRandom = ABS(CHECKSUM(NEWID())) % 5 + 1

select * from @table where ID = @targetRandom
Neil Moss
@Neil, +1. You and Daniel seemed to have come up with the same solution but I still fail to *understand* why the select could return more than 1 row.
Lieven
@Neil: re-reading your answer, you were also spot on.
Lieven
+8  A: 

The reason this happens is because NEWID() gies a different value for each row in the table. For each row, independently of the others, there is a one in five chance of it being returned. Consequently, as it stands, you actually have a 1 in 3125 chance of all 5 rows being returned!

To see this, run the following query. You'll see that each row gets a different ID.

SELECT  * , NEWID()
FROM    @Table  

This will fix your code:

DECLARE @Id int
SET @Id = ABS(CHECKSUM(NEWID())) % 5 + 1

SELECT  * 
FROM    @Table  
WHERE   ID = @Id

However, I'm not sure this is the most efficient method of selecting a single random row from the table.

You might find this MSDN article useful: http://msdn.microsoft.com/en-us/library/Aa175776 (Random Sampling in T-SQL)

EDIT 1: now I think about it, this probably is the most efficient way to do it, assuming the number of rows remains fixed and the IDs are guaranteed to be contiguous.

EDIT 2: to achieve the desired result when used as a sub-query, use TOP 1 like this:

SELECT  t.ID 
        , FK1 = (SELECT TOP 1 FK1 FROM @Table ORDER BY NEWID()) 
FROM    @Table t
Daniel Renshaw
You where faster, I was writing the same example so +1
Don
@Daniel, you'r example works for a single select but for our requirement, this would require a cursor where @Id always get's a new value. We would prefer a set based approach. I have updated the question to reflect this.
Lieven
@Daniel, also, I am under the impression you *know* whe multiple results could be returned but I don't grasp it. How do you get to the 1 in 3125 chance of all 5 rows being returned?
Lieven
Daniel Renshaw
@Lieven, I've added an example you can run to help explain why you get multiple rows.
Daniel Renshaw
+1  A: 

Or you could use RAND() instead of NEWID(), which is only evaluated once per query in MS SQL

If you want to use CHECKSUM to obtain a random row, this is the way to do it.

SELECT TOP 1 *
FROM @Table
ORDER BY CHECKSUM(NEWID())

what about?

SELECT  t.ID 
        , FK1 = (SELECT TOP 1 FK1 FROM @Table ORDER BY NEWID()) 
FROM    @Table t 
Chris Bednarski
@Chris Bednarski: RAND() wouldn't work for us. I have updated the question to further explain our usage scenario.
Lieven
@Lieven: updated my answer
Chris Bednarski
@Lieven: edited my answer again
Chris Bednarski
@Chris, we know about the ORDER BY NEWID(). It would work for us (+1 for that btw) but I'd like to understand why there is more than one row returned with the given example.
Lieven
+1  A: 

This may help you understand the reasons. Run the query multiple times. How many times does MY_FILTER = ID ?

SELECT  *, ABS(CHECKSUM(NEWID())) % 5 + 1 AS MY_FILTER
FROM    @Table

SELECT  *, ABS(CHECKSUM(NEWID())) % 5 + 1 AS MY_FILTER
FROM    @Table

SELECT  *, ABS(CHECKSUM(NEWID())) % 5 + 1 AS MY_FILTER
FROM    @Table
Chris Bednarski
MY_FILTER is *always* equal to **exactly one** ID. Not per sé the ID of the row in question but I don't see how that would pose a problem in the *simple* `SELECT * FROM @Table WHERE ID = ABS(CHECKSUM(NEWID())) % 5 + 1`.
Lieven
The WHERE is evaluated for each row and NEWID returns a different value each time it is evaluated.
Daniel Renshaw
<ting>... that turned on the light. Thanks/
Lieven
A: 

I don't know how much this will be helpful to you, but try this.. All I understood is you want one random row each time you execute the query..

select top 1 newid() as row,ID from @Table order by row

Here is the logic. Each time you execute the query a newid is being assigned to each row and all are unique and the you just order them with the new uniquely generated rowid. Then all you need to do is select the top most or whatever you want..

Gourav C