views:

134

answers:

5

I have a star schema type data base, with fact tables that have many foreign keys to dimension tables. The number of records in each dimension table is small - often less than 256 bytes, but always less than 64k. The fact tables typically have hundreds of thousands of records, so I want maximize join speed.

I'd like to use tinyints and smallints, but a coworker says I'm crazy to worry about this and just use 4 byte ints in every case. Who is right?

+4  A: 

Go with the 4 Byte Ints and do your optimisation elsewhere. Any effort you spend here won't gain you enough of a return compared with ease of coding, ease of use and ease of maintenance that a simple schema offers.

Brody
Not so, index width can be extremely significant to performance, especially in a star schema, where the PK is often a multiple column composite integral key
Charles Bretana
A: 

As always with performance questions, it depends. If your fact rows are tiny, say 20 bytes each, then a saving of two bytes per row will save 400 bytes, and allow you to fit an extra 20 rows on each page. If your fact rows are larger, say 500 bytes, then you'll only be able to save 32 bytes, which won't matter at all.

The benefit of using an INT over a SMALLINT is that you don't have to worry about what happens if you suddenly get more rows than you expected.

SQL Server 2008 includes row-level compression, which means you can declare the value as a 4-byte INT, but it will store the value in the most appropriately sized type for each row.

Jim McLeod
Jim, does SQL 2k8 do row-level compression for index page entries as well as data pages?
Charles Bretana
Yes, row-level and page-level compression applies to indexes (page level doesn't apply to non-leaf index pages). Details at http://msdn.microsoft.com/en-us/library/cc280449.aspx. However, you'll need Enterprise or Developer edition to use compression.
Jim McLeod
A: 

4-byte Integers for primary keys are for most solutions fine.

If you want some flexibility in where you can create your PK value and do some data replication later on, you might want to think about using uniqueidentifiers. A Guid is easily created at the database, within a stored procedure, within a DAL layer or anywhere else and is guaranteed to be unique.

Sometimes only that can give your solution some additional performance by not having to do a database lookup to get a new record ID. (i.e. create it in a DAL layer and store it right away instead of having to use something like scope_identity() or @@Identity)

Hope this helps.

Jeroen Landheer
+2  A: 

Yr co-worker is wrong. If you use four byte integers for the foreign Keys, then the primary keys in the fact table have to be 4-byte integers as well. And then you are making your fact table wider than it needs to be, reducing the number of records that can fit on a single index page. To the degree that this changes the width of the primary Key Index, this will adversely affect index performance. If your Primary key could have been two tinyInts and 3 smallints, and you change to five 4-byte ints, you have changed the width of the index from 8 bytes wide to 20 bytes wide. Your index will have less than half as many entries per I/O page, and it will require twice as many logical and/or physical reads to traverse.

NOTE: As Jim McLeod's answer below, SQL Server 2008, (Enterprise or Developer edition), includes row-level compression, which means you can declare the value as a 4-byte INT, but it will store the value in the most appropriately sized type for each row.

Charles Bretana
+2  A: 

On a 32 bit server, smaller ints aren't going to save you anything in CPU performance, even less on a 64 bit server. Maybe you'll get a disk saving and therefore some disk improvement, but overall the total improvement may be negligble.