views:

489

answers:

5

In the following scenario:

1 Database 4 Web servers

How do the Web servers generate unique ID's for the database so that they are unique? Yes it's possible to use Auto-increment, but this is too easily crawled/guessed/etc. So auto-increment is currently not an option.

+7  A: 

Use a UUID (http://www.ietf.org/rfc/rfc4122.txt). Collisions are unlikely, and could be dealt with when they occur by regenerating a new UUID, or they could be prevented by concatenating a unique id for each server (like the mac address): -

StringBuilder sb = new StringBuilder(UUID.randomUUID());
InetAddress address = InetAddress.getLocalHost();
String uid = sb.append(NetworkInterface.getByInetAddress(address).getHardwareAddress());
Jim Downing
You can, but it's the same issue. You're still going to get a lot of collisions because you can't guarantee the ID's will be unique on the different servers...
Stephane Grenier
@Stephanie: are you sure about that? From <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/util/UUID.html">http://java.sun.com/j2se/1.5.0/docs/api/java/util/UUID.html</a>, it certainly appears that they're _universally_ unique. Does your experience differ?
CPerkins
I heard that the propability of collision is 1 in 10^99... I think your safe.
Zoidberg
The UUIDs are 'guaranteed' to be unique even if they are created in different servers. The possibility of a collision is minimal.
kgiannakakis
Shouldn't get many collisions, especially if you concatenate them with a mac address.
Jim Downing
@CPerkins They are unique, as long as they're generated on the same JVM. Since there are 4 webservers, you have 4 separate VM's running.
Stephane Grenier
@Zoidberg There's an interesting article at: http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/ For small databases it's not a big issues, but as it grows it becomes a big issue. To quote: "So in this little case we have about 200 times performance difference which is worth to consider"
Stephane Grenier
@Jim Good idea on concatenating the mac address. That might work! Can you post it as an answer and I'll upvote it.
Stephane Grenier
Let the database create the unique ids by inserting a row into an id generator table with a uniqueidentifier column type and reading the created value back within a transaction. You can then guarantee that this value is unique across all servers because it's generated on a single table in the database.
tvanfosson
If you use `UUID.randomUUID()` you're not going to get many - and you can regenerate a UUID if you get key constraints resulting from collisions.
Jim Downing
When UUIDs are constructed, most O/S implementations will incorporate an Ethernet hardware address (a.k.a MAC address) from an Ethernet card associated with the host, so you won't gain anything by concatenating the MAC address to the UUIDs (but you will slow things down with the bigger indentifier).
Stephen C. Steel
There's no need to append the MAC address. If you use `randomUUID` then you're statistically unlikely to see any collisions for hundreds of years. Just put a unique constraint on your id column and handle the error appropriately, if it ever happens.
LukeH
+1  A: 

You can use a UUID:

import java.util.UUID;        

UUID uuid = UUID.randomUUID();
System.out.println(uuid.toString());
kgiannakakis
Right, but how do you prevent collisions? As the database gets fuller and fuller, the collisions will only increase, and hence performance degrades...
Stephane Grenier
@Stephane: If you use `randomUUID` then you're statistically unlikely to see any collisions for hundreds of years. Just put a unique constraint on your id column and handle the error appropriately, if it ever happens.
LukeH
Although collisions are THEORETICALLY possible, the probability of even one collision occuring during the lifetime of your application is extremely small.
Stephen C. Steel
@Stephen Maybe not as small as you think: http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/
Stephane Grenier
@Stephen It really depends on the size of your data. For this database we're expecting a LOT of data
Stephane Grenier
@Stephane: The linked article is mainly talking about performance issues due to the *size* of UUIDs and how they're stored in MySQL. I don't see any mention of actual UUID collisions.
LukeH
+1  A: 

If you are really worried about collisions, you can pre-generate your keys and store them in a database table with a unique index. Then have a periodic job that populates the table during downtime, and removes/archives used keys once in a while.

RedFilter
+1  A: 

What DB system are you using? Does the app know which server is making the request? Are you letting the DB decide the key, or setting it in code?

It could be as simple as using an auto-increment with a prefix or 2nd field indicating the server that requested the key.

NickSentowski
A: 

I'm not sure why an auto-increment or sequence is unacceptable. You want an internal ID to not be "guessable"? What, it's like this is an account number and you don't want someone to be able to guess a valid account number?

Well, okay, besides UUIDs already mentioned, two obvious possibilities come to mind.

  1. Use a sequence, then generate a random number, and create the account number from a combination of the two using an algorithm such that two different sequences numbers cannot give the same final number. For example, a simple algorithm would be: Take the next sequence number, multiply by 12345678, generate a random number from 0 to 12345678-1, and add the two together.

  2. Have a table on the database with one record, which holds the last assigned number. Each time you need a new number, lock this record, use the previous value to generate the next value, and update the record. As long as the numbers always increase, you're guaranteed to not have a duplicate.

If you have some scheme that uses an identifier of the server as part of the identifier, I'd encourage you to not have that identifier simply be a number stored in a configuration file somewhere. I'm working on a system now where someone had the bright idea to give each server a "server id" that is built in to record id's, and the server id is a small integer that is manually assigned. It's not too hard in production where there are only 3 servers. But in development and testing, where new servers are coming up and down all the time and test configuration files are constantly being tossed around, it's a pain to administer. I'd avoid using a server id period, but if you're going to use one, make it automatically assigned by some central server, or derive it from the IP, or something safe.

Jay