views:

26

answers:

2

I'm working on a project that I want to have be as flexible and scalable as possible from the beginning. A problem I'm concerned about is one best described by Joshua Schacter in Founders at Work, who noted it as one detail he wish he would've planned for ahead of time.

Scaling past one machine, one database, is very challenging, even with replication. The tools that are there are not quite right.

For example, when you add things to a table and it numbers them, that means you can't have a second machine also adding to them because the numbers will collide. So what do you do? You have to come up with some completely different way to do it.

Do you have a central server that hands out number sets, or do you come up with something that's not numbers? Do you use random numbers and hope they never collide? Whatever it is, auto-assigned IDs just don't fly.

Has anyone here faced this problem? What are ways to move beyond auto-incremented IDs, or is there a way to have them scale with multiple servers?

+1  A: 

GUIDs, your chances of collision are astronomically low.

It's also possible to have (what we called) SmartGUIDs (usually called COMB GUIDS - see this analysis, particularly page 7) where you can encode a timestamp within the GUID, so you get record creation date information "for free" - so you can save a timestamp column for record creation datetime - which gets back some of what you lost on moving from 32-bit integer to 128-bit GUID. These can also be guaranteed to be monotonic, unlike regular GUIDs, which can be useful for clustered indexes and for sorting.

You can also use composite keys with some kind of server/db ID with a regular auto-increment identity or auto-number.

Cade Roux
+2  A: 

Use GUID/UUID (globally/universally unique identifier). In theory it's guaranteed to be unique across multiple machines.

lubos hasko