views:

621

answers:

8

I recently refactored a piece of code used to generate unique negative numbers.
edit: Multiple threads obtain these ids and add as keys to a DB; numbers need to be negative to be easily identifiable -at the end of a test session they're removed from the DB.

My Java algorithm looks like this:

private final Set<Integer> seen = Collections.synchronizedSet(new HashSet<Integer>());
public Integer generateUniqueNegativeIds() {
    int result = 0;
    do {
        result = random.nextInt();
        if (result > 0) {
            result *= -1;
        }
    } while (!seen.add(result));
    return result;
}

The code structure above, with its speculative addition to the set and "retry" loop, makes me think there's an equivalent nonblocking algorithm that replaces the synchronized set with any of the atomic variables.

I made a few attempts to re-write using atomic variables but all failed the multithread attack test.

Is there an elegant nonblocking equivalent?

edit: for curiosity's sake here's a flawed attempt using an atomic integer as a guard

private final AtomicInteger atomi = new AtomicInteger(0);
public Integer generateUniqueNegativeIdsWithAtomicAlgo() {
    boolean added = false;
    int result = 0;
    do {
        result = random.nextInt();
        if (result > 0) {
            result *= -1;
        }
        if (atomi.compareAndSet(0, result)) {
            added = cache.add(result);
        }   
    } while (!added);
    return atomi.getAndSet(0);
}

edit: test harness below:

public static void main(String[] args) {
    final int NUMBER_OF_THREADS = 10000;
    final Set<Integer> uniques = Collections.synchronizedSet(new HashSet<Integer>());
    final List<Integer> positives = Collections.synchronizedList(new ArrayList<Integer>());
    final NegativeUniqueIdGenerator nuig = new NegativeUniqueIdGenerator();
    Thread[] workers = new Thread[NUMBER_OF_THREADS];
    long start = System.nanoTime();
    for (int i = 0; i < workers.length; i++) {
        Runnable runnable = new Runnable() {
            public void run() {
                int number = nuig.generateUniqueNegativeIds();
                if (number > 0) {
                    positives.add(number);
                }
                uniques.add(number);
            }
        };
        workers[i] = new Thread(runnable);
        workers[i].start();
    }
    for (int i = 0; i < workers.length; i++) {
        try {
            workers[i].join();
        } catch (InterruptedException ie) {}
    }
    long end = System.nanoTime();
    System.out.println(String.format("duration = %dns", (end - start)));
    System.out.println(String.format("#threads = %d", NUMBER_OF_THREADS));
    System.out.println(String.format("#uniques = %d", uniques.size()));
    System.out.println(String.format("#positives = %d", positives.size()));
    System.out.println(String.format("#duplicates = %d", NUMBER_OF_THREADS - uniques.size()));
    System.out.println(String.format("ratio = %f",
   ((double) NUMBER_OF_THREADS - uniques.size())
     / NUMBER_OF_THREADS));
    assert uniques.size() == NUMBER_OF_THREADS;
}
+3  A: 

The elegant solution for all listed requirements, as far as I can tell, is just decrementing a value starting at -1. I suspect, however, that you haven't listed all requirements.

Devin Jeanpierre
+9  A: 

If you're not concerned about the randomness, you can just decrement a counter, like this:

private final AtomicInteger ai=new AtomicInteger(0);

public int nextID() {
  return ai.addAndGet(-1);
}

Edit:

For random numbers, you can just use your solution and use eg. ConcurrentHashMap or ConcurrentSkipListSet instead of the synchronizedSet. You have to ensure that different threads use different instances of the random generator, and that these generators are not correlated.

jpalecek
How about decrementAndGet()? It's better. But +1 for being good enough.
erickson
I agree with erickson
David Zaslavsky
Yeah, almost any other method of AtomicInteger would do (getAndAdd, getAndDecrement ...). AFAIK, they are all equally efficient on Intel. It could be implemented in terms of compareAndSet, too, but that's more complex
jpalecek
My main reason is just that it's more readable.
erickson
By nonblocking I think the questioner meant reentrant and not using synchronized.
Bill K
Could be, but I don't think so. Nonblocking and reentrant are totally different concepts.
David Zaslavsky
An atomic like this satisfies the non-blocking criterion because there is no need to wait for the atomic variable to be available. Re-entrancy is also possible because any number of code paths may reach the atomic op concurrently without violating the atomic update guarantee.
TokenMacGuy
True, decrementing is a very elegant solution -and one that made me laugh. Thanks guys. But what if the requirements specify unpredictability? Can we still write a nonblocking algo along similar lines?
jorgetown
@jorgetown: If you require unpredictability, you could simply use sequential integer offsets into a precalculated table of negative "random numbers."
j_random_hacker
there will still be AS operations to protect the memory in this route but the overhead should be much less than a contended lock.preallocation of blocks of integers will be fastest but involves more set up
ShuggyCoUk
+1  A: 

I think what you mean is non-blocking and reentrant.

edit: (replaces my original because this is much better)

A thread-based option that is actually pretty performant just came to mind (at least more performant than your original). If you created a weak hash map with a thread object as the "Key" and as the "Value" put an object with the ability to manufacture a series of, say, 1000 numbers from a specific range.

That way you assign each thread it's own 1000 number range to allocate from. When the object runs out of numbers, have it return an invalid number (0?) and you'll know that you have to allocate a new range to that object.

There would be no synchronizing anything anywhere (edit: whoops, was a little wrong. See below), the weak hash map would automatically free threads that were destroyed (no special maintenance) and the slowest part would be a single hash lookup of the thread which is actually very quick.

get the current running thread with:

Thread currThread=Thread.getCurrentThread();

Also I could be wrong and you could just need to make the method synchronized, then this would work:

int n=-1;
synchronized int getNegativeNumber() {
    return n--;
}

I went ahead and wrote it (sometimes this stuff gets stuck in my head until I do it, and as long as I did it anyway, I might as well post it). Untested and all, but I'm pretty sure it should be close if not operational right out of the box. Just one class with one static method to call to get a unique negative number. (Oh, and I did need some synchronization but it will only be used .001% of the time).

Wish there was a way to create a linked code block instead of inline like this without going off site--sorry about the length.

package test;

import java.util.WeakHashMap;

public class GenNumber {
 // Static implementation goes first.
 private static int next = -1;
 private static final int range = 1000;

 private static WeakHashMap<Thread, GenNumber> threads = new WeakHashMap<Thread, GenNumber>();

 /**
  * Generate a unique random number quickly without blocking
  * 
  * @return the random number < 0
  */
 public static int getUniqueNumber() {
  Thread current = Thread.currentThread();
  int next = 0;

  // Have to synchronize some, but let's get the very
  // common scenario out of the way first without any
  // synchronization. This will be very fast, and will
  // be the case 99.9% of the time (as long as range=1000)
  GenNumber gn = threads.get(current);
  if (gn != null) {
   next = gn.getNext();
   if (next != 0)
    return next;
  }

  // Either the thread wasn't found, or the range was
  // used up. Do the rest in a synchronized block.
  // The three lines tagged with the comment "*" have
  // the potential to collide if this wasn't synchronized.
  synchronized (threads) {
   if (gn == null) {
    gn = new GenNumber(next -= range); // *
    threads.put(current, gn); // *
    return gn.getNext(); // can't fail this time
   }
   // now we know the range has run out

   gn.setStart(next -= range); // *
   return gn.getNext();
  }
 }

 // Instance implementation (all private, nobody needs to see this)
 private int start;
 private int count;

 private GenNumber(int start) {
  setStart(start);
 }

 private int getNext() {
  if (count < range)
   return start - count;
  return 0;
 }

 private GenNumber setStart(int start) {
  this.start = start;
  return this;
 }
}

It just struck me that instead of the one large synchronized block could be replaced by 2 very small ones synchronized on different objects, one for the "+= count" and one for the .put(). If collisions were still slowing you down, that might help (although if collisions were still slowing you down (REALLY???) you'd be better served just raising count.

Bill K
Do you think you can run threads.get() without the lock, eg. when a threads.put() may be concurrently running?
jpalecek
Yes, since there is no way the get is looking for what's being put (by definition since they have to be on different threads). If there was a possible race condition (if you could be looking for what you are putting in at the same time) it wouldn't work.
Bill K
You seem to be uderestimating the possibility of race conditions. Mere concurrent accesses to the threads object, where one of the accesses is a write, constitutes a race condition and mean the read might eg. not return past values that are in the Map.
jpalecek
You could be right, forgot about updating the hash. All failures are guaranteed detectable though, so you still don't need a synchronized for the read but it might help to surround it in a retry loop (try/catch in side a while)
Bill K
+6  A: 

The other answers which suggest using a counter are excellent, but if nonpredictability (or at least, nontrivial predictability) is important, your original algorithm should be just fine.

Why?

Basically, the probability that you will get a repeat integer is very very (very) (very) small, roughly 1 divided by the number of integers you have not yet seen. If you've already generated N numbers, the expected runtime of the algorithm is approximately linear in N with a coefficient of 1/2^32, which means that you would have to generate over a billion numbers just to get the expected runtime to exceed 2 iterations of the loop! In practice, checking the set for the existence of a certain number will do far more to stretch out the runtime of your algorithm than the possibility of repeating the loop (well, unless you're using a HashSet maybe - I forget what its asymptotic runtime is).

For what it's worth, the exact expected number of loop iterations is

2^64/(2^32 - N)^2

After you've generated a million numbers, this works out to 1.00047 - which means, say, to generate the 1,000,001th to 1,002,000th numbers, you'll probably get one repeat number, total, in all those calls.

David Zaslavsky
True, but I've seen lots of cases where someone didn't think of using a counter and went straight to random--very few cases where people actually needed a non-predictable answer.
Bill K
+2  A: 

From the requirements you've given, I would personally just use a medium-quality random number generator that you know won't produce duplicates within the number of unique numbers that you require. Unless you have an extra requirement you haven't mentioned, it seems overkill to keep the set of all the previously generated numbers.

For example, using a 32-bit XORShift generator will produce all 2^31 negative 4-byte integers in "random" order before repeating the pattern. If you need more numbers than that, you probably don't want to be putting them in a hash set anyway. So something like this (warning: off-top-of-head untested code...):

int seed = (int) System.nanoTime();
final int origSeed = seed;

public int nextUniqueNegativeNumber() {
  int n = seed;
  do {
    n ^= (n << 13);
    n ^= (n >>> 17);
    n ^= (n << 5);
    seed = n;
    if (n == origSeed) {
      throw new InternalError("Run out of numbers!");
    }
  } while (n > 0);
  return n;
}

I'll leave it up to the reader to convert "seed" to use an AtomicInteger if concurrency is necessary...

Edit: actually, to optimise the concurrent case you maybe only want to write back to "seed" after getting the next negative number.

OK, by popular demand, the atomic version would then be something like this:

  AtomicInteger seed = new AtomicInteger((int) System.nanoTime());

  public int nextUniqueNegativeNumber() {
    int oldVal, n;
    do {
      do {
        oldVal = seed.get();
        n = oldVal ^ (oldVal << 13); // Added correction
        n ^= (n >>> 17);
        n ^= (n << 5);
      } while (seed.getAndSet(n) != oldVal);
    } while (n > 0);
    return n;
  }
Neil Coffey
I really like your approach here, even though it doesn't address my question of "can the algorithm be rewriten using nonblocking constructs?"
jorgetown
Sorry, didn't realise that making it atomic was the difficult bit. It's really just a trivial application of, say, AtomicInteger. But I've added the code.
Neil Coffey
Your 'shuffle' generates lots of duplicates, even if not immediate ones due to the loop guard
jorgetown
Sorry, I made a typo in the concurrent version (see correction). But in principle, the method really shouldn't generate duplicates until it's gone through every possible 32-bit number. Then, sure, it repeats the cycle-- there are only a finite number available after all...!
Neil Coffey
Thanks for the tutorial and research references @ http://www.javamex.com/tutorials/random_numbers/xorshift.shtml, Neil. Terrific stuff!
jorgetown
Neil, perhaps you'll find this research paper interesting too:On the Xorshift Random Number Generatorshttp://www.iro.umontreal.ca/~lecuyer/myftp/papers/xorshift.pdfThey argue Marsaglia's generators have poor equidistribution -something I observed in my cross hardware|platform testing.
jorgetown
Checkout pages 5 and 11 of their paper for some results.I found their suggested 'best' tuple (7, 1, 9) better than (5, 17, 13) across different hardware and OSs. I observed ~.1% of duplicates testing the tuple (5, 17, 13) with 1000+ threads in different platforms
jorgetown
Thanks, I definitely will check that out.
Neil Coffey
+2  A: 

Try this: http://www.javaconcurrencyinpractice.com/listings.html

Tusc
+2  A: 

I would combine the OP's answer with jpalecek's to give:

private final AtomicInteger ai=new AtomicInteger(0);

public int nextID() {
    return ai.addAndGet(-1 - random.nextInt(1000));
}
Skip Head
+2  A: 

The high-scale lib has a NonBlockingHashSet which you can use. Just replace your set instance with an instance of NonBlockingHashSet and you are all set.

http://sourceforge.net/projects/high-scale-lib

pdeva