views:

526

answers:

9

Hi,

what if you have so many entries in a table, that 2^32 is not enough for your auto_increment ID within a given period (day, week, month, ...)?
What if the largest datatype MySQL provides is not enough?

I'm wondering how should I solve a situation where I'm having so many entries added to my table which require unique ID, but I fill up my datatype within one period?

How could I natively within, MySQL (or any other system), achieve unlimited amount of unique IDs or at least increase it exponentially?

Ideally I would expect something like

> SELECT * FROM table;

+---+------+
| a |  b   |
+---+------+
| 1 |  1   |
| 1 |  2   |
| 1 |  3   |
|...| .... |
|...| .... |
| 1 | 2^32 |
| 2 |  1   |
| 2 |  2   |
+---+------+

Which exponentially increases the amount of entries.

How do you cope with such situations?
Remember - requirement is to have unique ID for any entry.

+7  A: 

Just use 128-bit keys. There is no need for an unlimited number of keys, since you very quickly allow more rows than the number of atoms in the universe. (somewhere around 256 bits).

Eclipse
+2  A: 

Don't use an autoincrementing primary key - use a GUID or similar - from the Wikipedia article:

While each generated GUID is not guaranteed to be unique, the total number of unique keys (2^128 or 3.4×10^38) is so large that the probability of the same number being generated twice is infinitesimally small. For example, consider the observable universe, which contains about 5×1022 stars; every star could then have 6.8×1015 universally unique GUIDs.

Rich
Didn't think MySQL had native support for GUIDs though?
Rowland Shaw
Then get a RDMS that can handle what you need.
Eclipse
Look up "birthday paradox". You'll actually have about a 50% chance to get the same GUID twice around the time you've generated 2^64 GUIDs. So it has no advantage over using a 64 bit autoincrementing type.
Michael Borgwardt
Depends on the algorithm you use to generate GUIDs. Most involve a time counter of some sort.
Pavel Minaev
A: 

I'm not sure how to generate them automatically in MySQL, and then, they wouldn't necessarily be sequential, but I'm pretty sure that you could use a GUID and not have to worry about them filling up.

dustyburwell
+12  A: 

You could use BIGINT for the primary key. This is a 64-bit number by default.

Edit #2: Apparently what I said before about varying the BIGINT byte length was incorrect. BIGINT is fixed at an 8-byte limit.

Jeremy CD
http://dev.mysql.com/doc/refman/5.1/en/numeric-types.html implies that BIGINTs are fixed at 8 bytes?
Rowland Shaw
Actually what Rowland says is correct - a BIGINT is always 64bit. The number that can be associated with numeric datatypes only define the display width and have NO impact on the storage capacity.
Stefan Gehrig
I'm confused 8 byte and 64 bit are the same thing right? so then whey the edit?
Fire Crow
Typically, 8 bytes and 64 bits are equivalent, and this is a typical case. I edited because I also suggested that BIGINT(n) varied the byte length, which is incorrect.
Jeremy CD
+5  A: 

I'd start by moving to BIGINT for 2^64. GUIDs would be another option, but you need to store these yourself in "some form"

Rowland Shaw
A: 

You could also use chars/varchars for your key columns, and use GUIDs for your keys. I don't know if that would incur a performance penalty when compared to integer primary keys though.

Kevlar
+1  A: 

When you add another column to your key, you are effectively doubling the number of index scans you will need to perform (albeit on a much smaller index for the second column).

As stated earlier, your best bet for VAST data sets is either a GUID (if your RDBMS supports it natively) or a varchar(16).

The nice part about using a varchar / varbinary is that you could automatically expand the column in the future, if needed. And the bad part is that varchar / varbinary is a poorly performing key, compared to an integer.

John Gietzen
+6  A: 

If you have so much data that you encounter this problem, then choosing a primary key is probably the least of your concerns.

If you're using the InnoDB engine, it may be helpful for performance to choose a primary key that you'll be frequently searching on (especially where the searches return many rows), as it clusters the primary key, which makes range scans better.

MarkR
+13  A: 

Don't you think a BIGINT UNSIGNED would be sufficient? That's a range of 0 - 18.446.744.073.709.551.615, or one year with 50.539.024.859.478.223 entries per day (365 d/y), 2.105.792.702.478.259 entries per hour, 35.096.545.041.304 entries per minute or 584.942.417.355 per second.

With assumed 600 writes per second (without any reads) you could write entries 974.904.028 years at full write speed. That should be enough.

Stefan Gehrig