views:

1293

answers:

3

I'm working on an application that will be implementing a hex value as a business key (in addition to an auto increment field as primary key) similar to the URL id seen in Gmail. I will be adding a unique constraint to the column and was originally thinking of storing the value as a bigint to get away from searching a varchar field but was wondering if that's necessary if the field is unique.

Internal joins would be done using the auto increment field and the hex value would be used in the where clause for filtering.

What sort of performance hit would there be in simply storing the value as a varchar(x), or perhaps a char(x) over the additional work in doing the conversion to and from hex to store the value as an integer in the database? Is it worth the additional complexity?

I did a quick test on a small number of rows (50k) and had similar search result times. If there is a large performance issue would it be linear, or exponential?

I'm using InnoDB as the engine.

+3  A: 

Is your hex value a GUID? Although I used to worry about the performance of such long items as indexes, I have found that on modern databases the performance difference on even millions of records is fairly insignificant.

A potentially larger problem is the memory that the index consumes (16 byte vs 4 byte int, for example), but on servers that I control I can allocate for that. As long as the index can be in memory, I find that there is more overhead from other operations that the size of the index element doesn't make a noticeable difference.

On the upside, if you use a GUID you gain server independence for records created and more flexibility in merging data on multiple servers (which is something I care about, as our system aggregates data from child systems).

There is a graph on this article that seems to back up my suspicion: Myths, GUID vs Autoincrement

Godeke
A: 

The hex value is generated from a UUID (Java's implementation); it's hashed and truncated to smaller length (likely 16 characters). The algorithm for which is still under discussion (currently SHA). An advantage I see of storing the value in hex vs integer is that if we needed to grow the size (which I don't see happening with this application at 16 char) we could simply increase the truncated length and leave the old values without fear of collision. Converting to integer values wouldn't work as nicely for that.

The reason for the truncation vs simply using a GUID/UUID is simply to make the URL's and API's (which is where these will be used) more friendly.

tgm
Godeke
A: 

All else being equal, keeping the data smaller will make it run faster. Mostly because it'll take less space, so less disk i/o, less memory needed to hold the index, etc etc. 50k rows isn't enough to notice that though...

Ask Bjørn Hansen