tags:

views:

407

answers:

4

I am writing a new program and it will require a database (SQL Server 2008). Everything I am running now for the system is 64-bit, which brings me to this question. For all of the Id columns in various tables, should I make them all INT or BIGINT? I doubt the system will ever surpass the INT range but it is a possibility within some of the larger financial tables I suppose. It seems like INT is the standard though...

+5  A: 

You should use the smallest data type that makes sense for the table in question. That includes using smallint or even tinyint if there are few enough rows.

You'll save space on both data and indexes and get better index performance. Using a bigint when all you need is a smallint is similar to using a varchar(4000) when all you need is a varchar(50).

Even if the machine's native word size is 64 bits, that only means that 64-bit CPU operations won't be any slower than 32-bit operations. Most of the time, they also won't be faster, they'll be the same. But most databases are not going to be CPU bound anyway, they'll be I/O bound and to a lesser extent memory-bound, so a 50%-90% smaller data size is a Very Good Thing when you need to perform an index scan over 200 million rows.

Aaronaught
@Aaronaught good post +1, quick question tho; I was under the impression that varchar(50), varchar(4000), and varchar(max) all took up the same space for a given string less than size 50, the difference is only in a limit SQL puts on the size the field can be. (http://msdn.microsoft.com/en-us/library/aa258242(SQL.80).aspx)
Hogan
@Hogan: Good point. Sensible max sizes are better for the sake of accurately describing domain requirements, but a better analogy would probably have been `char(10)` vs. `char(50)`.
Aaronaught
+3  A: 

You should judge each table individually as to what datatype would meet the needs for each one. If an INTEGER would meet the needs of a particular table, use that. If a SMALLINT would be sufficient, use that. Use the datatype that will last, without being excessive.

AdaTheDev
+10  A: 

OK, let's do a quick math recap:

  • INT is 32-bit and gives you basically 4 billion values - if you only count the values larger than zero, it's still 2 billion. Do you have this many employees? Customers? Products in stock? Orders in the lifetime of your company? REALLY?

  • BIGINT goes way way way beyond that. Do you REALLY need that?? REALLY?? If you're an astronomer, or into particle physics - maybe. An average Line of Business user? I strongly doubt it

Imagine you have a table with - say - 10 million rows (orders for your company). Let's say, you have an Orders table, and that OrderID which you made a BIGINT is referenced by 5 other tables, and used in 5 non-clustered indices on your Orders table - not overdone, I think, right?

10 million rows, by 5 tables plus 5 non-clustered indices, that's 100 million instances where you are using 8 bytes each instead of 4 bytes - 400 million bytes = 400 MB. A total waste... you'll need more data and index pages, your SQL Server will have to read more pages from disk and cache more pages.... that's not beneficial for your performance - plain and simple.

PLUS: What most programmer's don't think about: yes, disk space it dirt cheap. But that wasted space is also relevant in your SQL Server RAM memory and your database cache - and that space is not dirt cheap!

So to make a very long post short: use the smallest type of INT that really suits your need; if you have 10-20 distinct values to handle - use TINYINT. If you need an order table, I believe INT should be PLENTY ENOUGH - BIGINT is only a waste of space.

Plus: should any of your tables really ever get close to reaching 2 or 4 billion rows, you'll still have plenty of time to upgrade your table to a BIGINT ID, if that's really needed.......

marc_s
I actually had to perform such an update, and you're right, we had well over 6 months of warning, and it wasn't that hard to do. Ironically, the entire key is about to disappear in the next version, as it really wasn't necessary. Normally I abhor natural keys, but when you have billions of rows in your table it's time to start thinking about them; 100 GB more free disk space and one less index to update when inserting 50,000 more rows were very good incentives.
Aaronaught
Thanks for the answer!
Rob Packwood
+1  A: 

The alignment of 32 bit numbers with x86 architecture or 64 bit with x64 architecture is called data structure alignment

This has no meaning for data in a database because here it's things disk space, data cache and table/index architecture that affect performance (as mentioned in other answers).

Remember, it's not the CPU accessing the data as such. It's the DB engine code (which may be aligned, but who cares?) that runs on the CPU and manipulates your data. When/if your data goes through the CPU it certainly won't be in the same on-disk structures.

gbn