views:

82

answers:

6

I am working on a survey application. I have a sql server database holding upto 3000 survey questions. I call database for a set of 10 questions. I have to make sure that none of the of the questions are repeated for the longest time possible. What is the best possible approach I can take here? Write a custom randomization algorithm or SQL server provides some functionality.

+1  A: 

Could you simply keep track of the last question id retrieved. Then always retrieve the next question with a higher id than the previous. If nothing is returned, get the first/lowest.

It is not going to scale well, but then it doesn't sound like you need to scale.

Mr Jacques
+1, this is the easiest way
KM
A: 
SELECT question_id
FROM questions 
ORDER BY NEWID()

gives you all the question ids in random order. store the list, and work your way down.

There isn't a way to ensure the longest time between repeats without storing the order. You could make it another column on the same table...

EDIT: a variation on Mr Jacques' idea:

how about a column named random? Shuffle the order like so:

update questions set random = (
    SELECT question_id
    FROM questions 
    ORDER BY NEWID()
)

then get each new question like this:

select * where random = 'prev_question_id';

Then you could shuffle the order at intervals, every night maybe.

Colin Pickard
That won't ensure that questions aren't repeated for the longest possible time, per the OP.
OMG Ponies
apologies; I submitted a bit too quick - hope this makes more sense now
Colin Pickard
A: 

In reality, getting a random set is only half the problem. What you really want is a new random set that contains entries that haven't been selected before.

Here is an article by Jeff Atwood that talks about "shuffling" that may help you out: Shuffling and a follow up article The Danger of Naïveté.

Jordan S. Jones
A: 

You could choose questions based on a combination of a random generator and a frequency counter on the question itself.

If you always needed 10 questions, start with a set of questions that has the lowest use frequency, then apply a random number generator to pick from that set. Increment the frequency of those questions. These most recently asked questions have a higher frequency number than the rest of the pool, so these are ignored in the next pass.

The process continues until you have all of the questions stepped to the next frequency, and the whole pool becomes available again.

Bill
A: 

This kind of sounds like homework.. If the emphasis really is on making sure the question is never repeated until absolutely necessary, then I would suggest adding a column or another table with a TimesUsed field.

At first you would have all 3000 records with a TimesUsed of 0. Grab the top 10

SELECT TOP 10 QUESTIONS WHERE TimesUsed = (SELECT MIN(TimesUsed) FROM Questions)

and UPDATE the table setting these question's TimesUsed to TimesUsed +1.

This is a simple approach, and you'd need to account for if you didn't have an amount of questions evenly divisble by 10, but I'll leave that to you :)

Of course none of this really matters if you aren't deadset on never repeating one until they've all been used.

Mike M.
A: 

Write a custom randomization algorithm. What you describe is business logic.

If possible add 2 columns to the table (or keep a separate table with the unique identifier from the questions table ... totally depends on your normalization preferences):

Ask Sequence int

Asked bit

Write a stored proc that does the following:

For each record in the questions table

  1. set Ask Sequence to a randomly generated number between 1 and total number of questions * 10
  2. set Asked to OFF (0)

Write a 2nd stored proc that gets the next NOT asked question in order of Ask Sequence and marks the question being returned as Asked (1).

If there are no more questions that have not been asked, call the 1st stored proc to shuffle and then get the next NOT asked question.

DaveWilliamson