views:

313

answers:

6

I've generally implemented sequence number generation using database sequences in the past.

e.g. Using Postgres SERIAL type http://neilconway.org/docs/sequences/

I'm curious though as how to generate sequence numbers for large distributed systems where there is no database. Does anybody have any experience or suggestions of a best practice for achieving sequence number generation in a thread safe manner for multiple clients?

A: 

Use java.util.concurrent.AtomicLong. It provides a threadsafe incrementAndGet() method.

public final class SequenceGenerator {
    private static final AtomicLong sequence = new AtomicLong();
    private SequenceGenerator() {}
    public static long next() {
        return sequence.incrementAndGet();
    }
}

Update: sorry, I overlooked the distributed systems bit. This of course won't work when used by different classloaders. You would need to ensure that the above class is loaded by a common system classloader or to host it at a certain "main" machine and provide it as a (web)service.

BalusC
Or you make it a service. Hook it up to a tcp server where clients can request numbers.
nos
+1  A: 

You could have each node have a unique ID (which you may have anyway) and then prepend that to the sequence number.

For example, node 1 generates sequence 001-00001 001-00002 001-00003 etc. and node 5 generates 005-00001 005-00002

Unique :-)

Alternately if you want some sort of a centralized system, you could consider having your sequence server give out in blocks. This reduces the overhead significantly. For example, instead of requesting a new ID from the central server for each ID that must be assigned, you request IDs in blocks of 10,000 from the central server and then only have to do another network request when you run out.

Steven Schlansker
A: 

Why not use a (thread safe) UUID generator?

I should probably expand on this.

UUIDs are guaranteed to be globally unique (if you avoid the ones based on random numbers, where the uniqueness is just highly probable).

Your "distributed" requirement is met, regardless of how many UUID generators you use, by the global uniqueness of each UUID.

Your "thread safe" requirement can be met by choosing "thread safe" UUID generators.

Your "sequence number" requirement is assumed to be met by the guaranteed global uniqueness of each UUID.

Note that many database sequence number implementations (e.g. Oracle) do not guarantee either monotonically increasing, or (even) increasing sequence numbers (on a per "connection" basis). This is because a consecutive batch of sequence numbers gets allocated in "cached" blocks on a per connection basis. This guarantees global uniqueness and maintains adequate speed. But the sequence numbers actually allocated (over time) can be jumbled when there are being allocated by multiple connections!

Phil
+1  A: 

If it really has to be globally sequential, and not simply unique, then I would consider creating a single, simple service for dispensing these numbers.

Distributed systems rely on lots of little services interacting, and for this simple kind of task, do you really need or would you really benefit from some other complex, distributed solution?

wsorenson
+1  A: 

There are a few strategies; but none that i know can be really distributed and give a real sequence.

  1. have a central number generator. it doesn't have to be a big database. memcached has a fast atomic counter, in the vast majority of cases it's fast enough for your entire cluster.
    • separate an integer range for each node (like Steven Schlanskter's answer)
    • use random numbers or UUIDs
    • use some piece of data, together with the node's ID, and hash it all (or hmac it)

personally, i'd lean to UUIDs, or memcached if i want to have a mostly-contiguous space.

Javier
+1  A: 

Now there are more options.

Thou this question is "old", I got here, so I think it might be useful to leave the options I know of (so far):

  • You could try Hazelcast. In it's 1.9 release it includes a Distributed implementation of java.util.concurrent.AtomicLong
  • You can also use Zookeeper. It provides methods for creating sequence nodes (appended to znode names, thou I prefer using version numbers of the nodes). Be careful with this one thou: if you don't want missed numbers in your sequence, it may not be what you want.

Cheers

Paolo
Zookeeper was the options I went with, there's a good description and writeup of this on the mailing list that I started - http://www.mail-archive.com/[email protected]/msg01967.html
Jon
Jon, thanks for pointing to that thread, that's exactly the type of solution I was thinking. BTW, did you make the code to overcome the MAX_INT limitation?
Paolo