views:

104

answers:

5

During testing, I'm stuck with testing a piece of code that receives a list of numbers and that's supposed to return a new random key that doesn't exist in the list. The valid range is any number between 1 and 1,000,000 - which makes it too hard to brute-force in tests.

What's the best approach for testing this? I've considered testing with a smaller range (say, 100) but that too, given the basic randomizing algorithms, will take too long once the list gets close to its maximal size.

A: 

What means to long? Comparing a value against 1.000.000 values in a list should only take a few milliseconds. And I can see no other solution then comparing against all values besides the list is sorted and you can narrow the range to inspect. You could of course sort the list and then perform a binary search taking no more then 20 steps, but sorting would be munch more expensive then a linear search.

I just did a test on a quite slow pc and it took about 20 ms to scan a list with 1.000.000 numbers for a given number in C#. Using an array it took 14 ms. Isn't that fast enough? A binary search did the job in 0.3 microseconds. Finally using a hash set the lookup took only about 90 nanoseconds.

If you have to write the algorithm, I suggest to do a simple trick. Take to lists - one with the assigned numbers, one with the unassigned numbers starting with all numbers from 1 to 1.000.000. If you need a new number, just get a random number between zero (inclusivly) and the length of the unassigned numbers list (exclusivly), pick the number at this index and move it to the assigned numbers list. Done.

I tested this approach, too, and it took about 460 milliseconds to get all 1.000.000 numbers from the unassigned to the assigned numbers list using a hash set for the unassigned numbers to speed up the deletion and a list for the assigned numbers. That are only about 460 nanoseconds to generate a new unique random number in the given range. You have to use a sorted dictionary to avoid interferences between the random number generator and the hash algorithm.

Finally you could also take the numbers from 1 to 1.000.000, but them into a list, shuffle them for a while, and the just take one after the other from the list. Besides the initial time to shuffle the list this will run in no time at all.

Daniel Brückner
A random algorithm that picks a random number and then checks whether it is in a list to see whether it's valid can be stuck in a loop for a very long time if it keeps randomizing values between 1-1000000 and the list contains every one of those other than 1000000
abyx
Now I got it ... you are trying to write the algorithm, not testing it. Your tags are confusing... ;)
Daniel Brückner
Actually, I AM trying to test it. The problem is that for the test to be 100% air-tight for bugs, it has exhaust all the different options the algorithm has to make sure it doesn't return a non-unique result.
abyx
+3  A: 

You can pick a random number in 1-1000000 and then search linearly forward until you find a free place (eventually overlapping to 1 after 1000000 has failed to match). This way the distribution of numbers is not linear (it is when the set is mostly empty, but then gets worse and worse), but this is much faster than checking a random value each time (and I hope the skew from randomness doesn't matter too much for a test) but OTOH you're sure you only need one call to random() and it can't ever take more than 1000000 checks to find the empty space.

lapo
+1 - broadly like my answer, but better.
Dominic Rodger
A: 

One approach that might work would be to take the initial list and populate a 1 million element vector for all indices i from 1...1,000,000 with 1 if i is taken, and 0 if i is not taken.

Count the size of the initial list, call this size s.

Generate a random number j, 0 <= j < s. Loop through the array, and find the jth element which is 0, and return that.

Edit: On closer inspection of @lapo's answer - my answer appears to amount to the same thing, but be a bit slower.

Dominic Rodger
A: 

The distribution of Lapo's answer isn't linear once your set of chosen numbers starts to get too full. You'll get even distribution of integers with the following modifications:

  • Hold your initial set of numbers in a bit array, where each element in the bit array corresponds to a number in your initial set. True indicates that an item exists in the set, false otherwise.

  • Next, create an array of integers from 1 - 1,000,000. Shuffle the array. This set of numbers will be your new keys.

  • Hold a pointer to the last index from your list of new keys. When you want to generate a new key, increment the pointer to the next item in new keys; you can test to see if its already chosen in your initial set in constant time. If the item already exists in the set, increment the pointer to the next item in new keys, otherwise return the item and set its state in the bit array to true.

Juliet
Oh, this is nice. It needs to keep state in RAM, but have a "perfect" distribution. Which solution is best largely depends on circumstances...
lapo
+1  A: 

I wonder if you could break your functionnality (or test or both) in two parts:

  1. The random number generation (at least the part that belong in your code, not the standard API call I guess, unless you want to test that too). For example, you could have this code (or a more refined version, according to your technologies) :
  2. The fact that you call your method must return a value that is not in the list.

    public class RandomGenerator {  
      public int getValue() {  
        return `<random implementation>`;  
      }  
    }
    
    
    public class RandomNewGenerator {
       RandomGenerator randomGenerator = new RandomGenerator();
       public int getValue(List<Integer> ints) { 
          // note that a Set<Integer> would be more efficient
          while(true) {
            Integer i = randomGenerator.getValue();
            if (!ints.contains(i)) {
              return i;
            }
          }
       }
    }
    

In real code, I would change stuff (use an interface, inject using Spring and so on)...

That way, in your test for RandomNewGenerator, you can override the RandomGenerator with an implementation that returns a known serie of values. You can then test your RandomNewGenerator without facing any random.

I believe this is indeed the spirit of JUnit tests, to make them simple, lightning fast, and even better : repeatable! This last quality actually allow your tests to be used as regression tests, which is so convenient.


Example test code:

    public class RandomNewGeneratorTest {
      // do the set up 
      private List<Integer> empties = ...//
      private List<Integer> basics =  ...  // set up to include 1,2, 7, 8

      private class Random extends RandomNewGenerator {
         int current;
         Random(int initial) {
            current = initial;
         }
         public int getValue() {  
           return current++; // incremental values for test, not random
         }
       }

      public void testEmpty() {
         RandomNewGenerator randomNewGenerator = new RandomNewGenerator();
         // do a simple injection of dependency
         randomNewGenerator.randomGenerator = new Random(1); 
         // random starts at 1, builds up
         assertEquals(1, randomNewGenerator.getValue(empties);
         assertEquals(2, randomNewGenerator.getValue(empties);
         assertEquals(3, randomNewGenerator.getValue(empties);
         assertEquals(4, randomNewGenerator.getValue(empties);
      }

      public void testBasic() {
         RandomNewGenerator randomNewGenerator = new RandomNewGenerator();
         // do a simple injection of dependency
         randomNewGenerator.randomGenerator = new Random(5); 
         // random starts at 5, builds up
         // I expect 7, 8 to be skipped
         assertEquals(5, randomNewGenerator.getValue(basics);
         assertEquals(6, randomNewGenerator.getValue(basics);
         assertEquals(9, randomNewGenerator.getValue(basics);
      }

    }

Note that this code is only a raw sample. You could alter it any way you need, for example by giving to the random generator a sequence of the values it must return. You could test for returning twice in a row the same number, for example.

KLE
+1 because you are basically right, but I simply hate having to extract all those tiny, useless interfaces simply for tests. Hated doing that when I did Java programming, and would have it even more in my python code.
abyx
Great that we agree. In this case, I didn't create any interface, I just made possible the injection of a subclass of RandomGenerator. Do you have an opinion about the "repeatability" of the test, that comes from not using random?
KLE
Yeah, that's just about the same - wrapping random by a class. About the repeatability, it is great, but what I usually do when I know I'm testing code that uses random is make sure to set the seed, and in case of a test failure we print it, so that you can debug with the seed that failed.
abyx
Just thought I'd say I've now grown to like more this style of coding, and that indeed repeatability of it is awesome. :)
abyx
I'm glad also when I find I was wrong in the past. It makes me feel I have improved. :-)
KLE