views:

15

answers:

0

I am curious about the contraints and tradeoffs for generating unique sequence numbers in a distributed and concurrent environment.

Imagine this: I have a system where all it does is give back an unique sequence number every time you ask it. Here is an ideal spec for such a system (constraints):

  • Stay up under high-load.
  • Allow as many concurrent connection as possible.
  • Distributed: spread load across multiple machines.
  • Performance: run as fast as possible and have as much throughput as possible.
  • Correctness; numbers generated must:
    1. not repeat.
    2. be unique per request (must have a way break ties if any two request happens at the exact same time).
    3. in (increasing) sequential order.
    4. have no gaps between requests: 1,2,3,4... (effectively a counter for total # requests)
  • Fault tolerant: if one or more, or all machines went down, it could resume to the state before failure.

Obviously, this is an idealized spec and not all constraints can be satisfied fully. See CAP Theorem. However, I would love to hear your analysis on various relaxation of the constraints. What type of problems will we left with and what algorithms would we use to solve the remaining problems. For example, if we rid of the counter constraint, then the problem becomes much easier: since gaps are allowed, we can just partition the numeric ranges and map them onto different machines.

Any references (papers, books, code) are welcome. I'd also like to keep a list of existing software (open source or not).


Software:

  • Snowflake: a network service for generating unique ID numbers at high scale with some simple guarantees.