ansaurus

Question

Should I initialize my AUTO_INCREMENT id column to 2^32+1 instead of 0?

Answer 1

+3 A:

Actually 0 can be problematic with many persistence libraries. That's because they use it as some sort of sentinel value (a substitute for NULL). Rightly or wrongly, I would avoid using 0 as a primary key value. Convention is to start at 1 and go up. With negative numbers you're likely just to confuse people for no good reason.

cletus 2009-06-12 22:40:33

How about starting with 1+2^32 then? The only problem I can see there is that it reduces my available id range by 2^32 values, which likely isn't an issue in a 64-bit id space.

slacy 2009-06-12 22:43:45

What cletus said. Just make the docs and wsdl (or what have you) clear that ID is an int64 and call it a day.

Wyatt Barnett 2009-06-12 22:44:16

The problem is that documentation may not be enough -- I can never guarantee that every API user "does the right thing" for these numbers, and may end up breaking at 2^31-1 if they use signed int32, and 2^32+1 if they used unsigned int. I can never tell what they're doing, since I don't control their code. Twitpocalypse!

slacy 2009-06-12 22:49:42

Answer 2

+5 A:

If you wanted to achieve your goal and avoid the problems that cletus mentioned, the solution is to set your starting value to 2^32+1. There's still plenty of IDs to go and it won't fit in a 32 bit value, signed or otherwise.

Of course, documenting the value's range and providing guidance to your API or data customers is the only right solution. Someone's always going to try and stick a long into a char and wonder why it doesn't work (always)

caskey 2009-06-12 22:45:20

Answer 3

+3 A:

What if you provided a set of test suites or a test service that used messages in the "high but still valid" range and persuade your service users to use it to validate their code is proper? Starting at an arbitrary value for defensive reasons is a little weird to me; providing sanity tests rubs me right.

Talljoe 2009-06-12 22:46:45

What if I chose the largest round number above 2^32? Say, 5000000000? Does that make you feel better?

slacy 2009-06-12 22:50:39

I like the idea of a regression suite. Can someone please suggest this to the people at http://apiwiki.twitter.com

slacy 2009-06-12 22:52:59

A little better. :) I once chose a large non-round number to seed a publicly-viewable ID so it looked we had more traffic than we did. Doing it to pander to people that can't read the documentation -- less interested. ;)

Talljoe 2009-06-13 00:23:14

Answer 4

+1 A:

If everyone alive on the planet sent one message per second every second non-stop, your counter wouldn't wrap until the year 2050 using 64 bit integers.

Probably just starting at 1 would be sufficient.

(But if you did start at the lower bound, it would extend into the start of 2092.)

lavinio 2009-06-13 00:22:26

Your math sucks, lavinio. ;)32-bit signed integers max out at ~2 billion, unsigned at ~4. There are more than 7 billion people on earth.And Twitter has already seen this happen just this week.http://www.twitpocalypse.com/

richardtallent 2009-06-16 00:00:02

Using 64 bit signed integers:2^63 ÷ 7,000,000,000 ÷ 365.25 ÷ 24 ÷ 60 ÷ 60 ≈ 412009 + 41 = 2050(The original post mentioned 64-bit integers; that's what I went with. My math doesn't suck, my English does ;).)

lavinio 2009-06-16 00:47:11

Exceeding 2^63 isn't the issue, it's exceeding 2^31 that is, so although the 2050 math is right, 2 billion (i.e. 2^31) isn't that big of a number anymore, especially when you've got scripts generating messages, not machines.

slacy 2009-06-19 00:25:49

Answer 5

+1 A:

Don't want to be the next Twitter, eh? lol

If you're worried about scalability, consider using a GUID (uniqueidentifier) instead.

They are only 16 bytes (twice that of a bigint), but they can be assigned independently on multiple database or BL servers without worrying about collisions.

Since they are random, use NEWSEQUENTIALID() (in SQL Server) or a COMB technique (in your business logic or pre-MSSQL 2005 database) to ensure that each GUID is "higher" than the last one (speeds inserts into your table).

If you start with a number that high, some "genius" programmer will either subtract 2^32 to squeeze it in an int, or will just ignore the first digit (which is "always the same" until you pass your first billion or so messages).

richardtallent 2009-06-15 23:55:14

The temptation of an AUTO_INCREMENT value is very high, although I'm thinking now that maybe I'll just use a random 128-bit value for each entry. I'm not really sure that I need something as sophisticated (and heavyweight) as a GUID. The identifiers are private to my system. The problem is that what I'd really like to do is to have my RDBMS engine (MySQL) auto-assign these values.

slacy 2009-06-19 00:24:00

Answer 6

+2 A:

Why use incrementing IDs? These require locking and will kill any plans for distributing your service over multiple machines. I would use UUIDs. API users will likely store these as opaque character strings, which means you can probably change the scheme later if you like.

If you want to ensure that messages have an order, implement the ordering like a linked list:

---
id: 61746144-3A3A-5555-4944-3D5343414C41
msg: "Hello, world"
next: 006F6F66-0000-0000-655F-444E53000000
prev: null
posted_by: jrockway
---
id: 006F6F66-0000-0000-655F-444E5300000
msg: "This is my second message EVER!"
next: 00726162-0000-0000-655F-444E53000000
prev: 61746144-3A3A-5555-4944-3D5343414C41
posted_by: jrockway
---
id: 00726162-0000-0000-655F-444E53000000
msg: "OH HAI"
next: null
prev: 006F6F66-0000-0000-655F-444E5300000
posted_by: jrockway

(As an aside, if you are actually returning the results as YAML, you can use & and * references instead of just using the IDs as data. Then the client will get the linked-list structure "for free".)

jrockway 2009-06-16 00:54:15

Answer 7

A:

One thing I don't understand is why developers don't grasp that they don't need to expose their AUTO_INCREMENT field. For example, richardtallent mentioned using Guids as the primary key. I say do one better. Use a 64bit Int for your table ID/Primary Key, but also use a GUID, or something similar, as your publicly exposed ID.

An example Message table:

Name           | Data Type
-------------------------------------
Id             | BigInt - Primary Key
Code           | Guid
Message        | Text
DateCreated    | DateTime

Then your data looks like:

Id | Code                                   Message   DateCreated
-------------------------------------------------------------------------------
1  | 81e3ab7e-dde8-4c43-b9eb-4915966cf2c4 | ....... | 2008-09-25T19:07:32-07:00
2  | c69a5ca7-f984-43dd-8884-c24c7e01720d | ....... | 2007-07-22T18:00:02-07:00
3  | dc17db92-a62a-4571-b5bf-d1619210245a | ....... | 2001-01-09T06:04:22-08:00
4  | 700910f9-a191-4f63-9e80-bdc691b0c67f | ....... | 2004-08-06T15:44:04-07:00
5  | 3b094cf9-f6ab-458e-965d-8bda6afeb54d | ....... | 2005-07-16T18:10:51-07:00

Where Code is what you would expose to the public whether it be a URL, Service, CSV, Xml, etc.

Jordan S. Jones 2009-06-16 02:57:01

What is the point of an ID column in your example?

jrockway 2009-06-16 10:44:51

Still used internally for Foreign Keys and it could still be used in internal applications. The whole point, is that you don't have to expose it publicly.

Jordan S. Jones 2009-06-16 16:22:53

The thing I don't like about the GUID based ideas, is that it means that I would likely be exposing these large and unweildy numbers in my URLs. i.e. http://mysite/message/3b094cf9-f6ab-458e-965d-8bda6afeb54d instead of http://mysite/message/5. I like the latter, although once you do get into the billions, there's not a huge difference between the 2 schemes.

slacy 2009-06-19 00:27:51

Personally, I'm not much of a GUID fan for the same reasons as you. In the past I've used a 0 padded string that was a combination of a random number and a sequetial number with a character prefix. For example MSG003240001 or MSG008290002.

Jordan S. Jones 2009-06-19 03:35:47

ansaurus

tags:

views:

answers:

Should I initialize my AUTO_INCREMENT id column to 2^32+1 instead of 0?

related questions