ansaurus

Question

Ensuring that items that have been viewed are not seen again

Answer 1

+1 A:

Use a floating point number between 0 and 1 instead of an integer. It has a nice domain, which doesn't change with the number of entities you have, and doubles have a 52 bit mantissa, which gives us approximately 2^26 objects before we can expect a collision; substantially more than what you're dealing with.

Nick Johnson 2009-05-31 15:38:02

Answer 2

+1 A:

I don't think you need to "choose 30 at random" out of your 100 question pool. The 100 were selected randomly to begin with - if you take the first 30, those will already be randomized. Your code will be simpler, with no less randomness.

Peter Recore 2009-06-08 19:51:38

Very good point.The idea there was that, for smaller pools, 300 problems, say, I didn't want people to have the sense that they were going through approximately the same problem sets once they wrap around. So it was an extraneous detail in this case, and it (kind of) goes against the question being asked, i.e., what is an efficient way to prevent people from seeing the same problems over again? But I didn't want to make the original question too complex by throwing in too many considerations.

Eric W. 2009-06-09 16:23:31

ahhh. Ultimately it depends on the type of randomness you want the users to experience. With the 'pick 30 out of subpools of 100' method, each time they wrap around the main pool, they will likely see question X 4 or 5 times before they see question Y, and there will be some questions they never see at all. If you go through the entire 300 in a random order, they will see every question 1 time before seeing any duplicates. To keep it interesting for users, you might create multiple whole-pool orderings, so that there might be 5 ways to go through the 300 items.

Peter Recore 2009-06-23 18:08:23

Answer 3

+1 A:

Put the items in some order and give each one a number. Maintain a counter for the user which starts at zero. Generate a random(ish) key for each user. Choose a function which maps the combination of an integer in the range 0...N and a key to another integer in the range 0...N, such that for any value off the key, the mapping between integers is bijective. When you need an item, take the value of the counter, put it through the function along with the key, and use that number to index into the list of items. Increment the counter.

All you need to store for each user is the key and the counter, and you have a guarantee that no item will ever be repeated. It's basically a way of constructing a huge permutation on the fly.

The problem is of course choosing the function!

A simple one would be f(counter, key) = (counter + key) mod N, and whilst this would work, it doesn't randomise the items at all, so everyone will get the items in the same order, just starting at different places. If N + 1 is prime, you can use F(counter, key) = (((counter + 1) * (key + 1)) mod (N + 1)) - 1, which would work pretty well. If the range was 0...2^64, you could use any block cipher which has a 64-bit block, which would give excellent randomisation. But it's not clear that you could adapt this to smaller sizes.

I'll have a bit of a dig to see if i can come up with a generically applicable function. This is a problem i've come across myself, and it would be great to finally have a good answer. I'll edit this answer if i find anything.

EDIT: Okay, here we go! I got the key ideas from a thread i started on sci.crypt, and in particular from one Paul Rubin, who is an all-round usenet hero.

Slight change of strategy. Put your items in a list, so they can be accessed by an index. Pick a number B, such that 2^B >= N - any value will do, but you really want the smallest one (ie the ceiling of the base-2 logarithm of N-1). We then refer to 2^B as M. Set up a counter which will run from 0 to M-1, and a key for each user, which can be an arbitrary byte string - a random integer is probably simplest. Set up the Magic Function, which is a bijection, aka a permutation, on the set of integers 0 ... M-1 (see below!). When you want an item, take the value of the counter, increment the counter, then put the original value through the Magic Function to get an index; if the index is greater than N-1, then throw it away, and repeat the process. Repeat until you get an index that is less than N. Sometimes, you'll need to go through one or more useless indices before you get a good one, but on average, it will take M/N tries, which is not too bad (that's less than 2 if you chose the smallest possible B).

Incidentally, calculating the ceiling of the base 2 logarithm of a number is easy:

int lb(int x) {
    int lb = 0;
    while (x > 0) {
        ++lb;
        x >>= 1;
    }
    return lb;
}

Okay, so the Magic Function, which maps a number from 0 ... M-1 to another such number. I mentioned block ciphers above, and that's what we're going to use. Except that our block is B bits long, which is variable and smaller than the normal 64 or 128 bits. So we need a cipher that works on small, variable-sized blocks. So we're going to write our own - a variable-sized mildly-unbalanced Feistel cipher. Easy!

To do a Feistel cipher, you need:

Numbers B_L and B_R such that B_L + B_R = B. In a balanced Feistel cipher, B_L = B_R, so B has to be even; we'll use B_L = B/2, and B_R = B - B_L, ie similar to a balanced network, but with a slightly bigger B_R when B is odd.
A number of rounds, called C; we'll count rounds with i, which will go from 0 to C-1. I've been told that 4 and 10 are absolute minimums, so let's go with 12 to be on the safe side.
A key schedule, which is just a key for each round, each called K_i, all derived from the main key i mentioned above. You could just use the same key for each round; proper ciphers have some way of generating subkeys from the master key. I'd suggest just concatenating the round number to the master key.
A round function, called F, which takes a B_R-bit sub-block and a round key, and produces a B_L-bit sub-block; within those constraints, it can be absolutely anything. This is the heart of a Feistel cipher. For us, the best option is just to use an already-existing cryptographic hash function, as this is easy and reliable. SHA-1 is the current favourite. We'll feed it the round key, followed by the input sub-block, compute the hash, then take B_L bits from one end (doesn't matter which) as our output.

A Feistel cipher works like this:

Take a B-bit input word
For i from 0 to C-1:
1. Split the word into left and right sub-blocks L and R, with B_L and B_R bits respectively
2. Put R and K_i through the round function F to produce a ciphering value X
3. Add X to L, discarding any carry bit which overflows from the top bit
4. Reassemble L and R the wrong way round, with R on the left and L on the right, to form a new value for B

The final value for B is the encrypted value, and is the output from the cipher. Decrypting it is pretty simple - do the above in reverse - but since we don't need to decrypt, don't worry about that.

So there you go. Maintain a counter (and a key, and the value of M), encrypt its value with a tiny cipher, and use the result as an index. Given that the cipher is a permutation, it's easy to show that this will never repeat, which should keep your clients happy. Even better, given the cryptographic properties of the cipher, you can also claim that the students will be unable to predict which question will come next (which is probably not important, but sounds cool).

All this is a bit more complicated than just incrementing a counter and giving them the items in order, but it's not that hard. You could do it in a hundred lines of java. Well, i can do it in a hundred lines of java - i don't know about you! :)

Incidentally, this will work with a growing pool of items, provided that you always add items at the end, although it will never pick items numbered higher than M. In many cases, though, that will give you some room for growth.

Tom Anderson 2009-06-08 20:27:04

Very interesting approach. I'm definitely interested in seeing what you come up with. In my case I'll need to be able to assure educational organizations that, when an exam is administrated, the participant will not be going through problems he or she has encountered before, so this is a pretty important point. I suppose I can divide the pool into two pools, practice questions and test questions, but I'd prefer to draw from the same pool.

Eric W. 2009-06-09 16:31:01

One question I had is whether it is possible to avoid assuming a fixed pool size for this approach. Can it be made to work when items are inserted in the middle of a pool or added to the end?

Eric W. 2009-06-10 05:23:56

Okay, custom small-block Feistel cipher with a hash as the round function, plus a modified Hasty Pudding trick. Can be made to work for a growing pool provided the items are only added at the end. Will explain in detail tomorrow!

Tom Anderson 2009-06-11 23:03:32

ansaurus

tags:

views:

answers:

Ensuring that items that have been viewed are not seen again

related questions