views:

1087

answers:

5

When designing a lookup table (enum) in SqlServer 2005, if you know the number of entries will never get very high, should you use tinyint instead of int? I'm most concerned about performance, particularly efficiency of indexes.

Let's say you have these representative tables:

Person
------
PersonId int  (PK)
PersonTypeId tinyint  (FK to PersonTypes)

and

PersonTypes
-----------
PersonTypeId tinyint
PersonTypeName varchar(50)

The obvious factors are data size and coding hassle. When we get to 100 million rows in the person table, we're storing 300 million less bytes with tinyint as opposed to int, plus the space taken up by our indexes. Not a huge amount of data, but significant if the design decision is applied to dozens of big tables. The coding hassle, of course, comes from all those casting problems back in the ASP.NET C#/VB code.

If we set aside those two issues, what else comes into play? Will queries be much more efficient due to the decreased size of the index pages? Or is there some sort of padding that happens that will just negate the benefits? Any other gotchas?

I've always just used ints personally, but I'm considering tinyint for an upcoming redesign/migration effort on some huge tables, so I'd love to get some advice.

[Edit]

After experimenting with this, the coding hassles I anticipated turned out to be a non-issue. Changing from int to tinyint hasn't resulted in any casting problems at all.

+3  A: 

I doubt that using smallint instead of int is going to have much performance benefit except in rare edge cases. You can easily build a test app for this though, create some test tables and do a million inserts/updates/selects and compare performance.

Craig
+10  A: 

The narrower a table (or index node entry) is, the more records (or index nodes) can fit on a single IO page, and the fewer physical (and logical) reads IO operations are required for any query. Also, the more index nodes there are on a single page, the fewer levels there may be in the index, from root to leaf level, and if by making a table narrower you pass the threshold where the index can be one level smaller, this can have a dramatic effect on perforamnce.

If by switching to TinyInt you change your table from 200 bytes wide to 197 bytes wide, it probably won't make any difference... But if you change it from 20 bytes to 14, (say you have 2 ints in there), then it could be dramatic...

Charles Bretana
As well as I can estimate, the effect would be 88 -> 70 for one of the main tables I have in mind.
Eric Z Beard
probably not going to matter much then... Don't know which db u r using, but on SQL Server, IO pages are 8K, so for table scans/seeks you would be going from to 93 to 117 records per page... What about the indices ? Are these int columns in your indices? It might have a biiger effect there.
Charles Bretana
Yeah, the table has more than a dozen indexes, and is used heavily for OLTP. Index size on disk is 10x data size. These columns are actually coming from a separate table being denormalized to reduce the number of joins needed. Indexes will almost all be modified to include these new fields.
Eric Z Beard
Then if the index only has one or two of these fields, and you change them from int (4 bytes) to TinyInt( 1 byte) you will garner nearly a threefold reduction in index size and increase in number of index nodes per IO Page
Charles Bretana
+2  A: 

Memory 101: Smaller stuff means holding more in RAM at once and thus fewer hard disk reads. If the DB is big enough and you're running certain kinds of queries, this could be a very serious factor. But it probably won't make big difference.

Brian
Except most of the time, anything smaller than an int will be expanded to an int in memory - and if it isn't, the item that comes next will be aligned to waste the 2 bytes you're supposedly saving anyway.
Paul Tomblin
+1  A: 

There is also the factor of maintaining the indexes/disk backups/tape backups which will also take up space, but I'd say the most important is IO and memory performance.

Sam
+1  A: 

Any other gotchas?

I'm not sure if this is the kind of "gotcha" you mean, but I've run into situations where using a datetime instead of a smalldatetime gave incorrect functional behavior, because the lower precision smalldatetime didn't compare as equivalent to the higher precision datetime for two dates that were otherwise "the same".

There's no chance of that happening here, since a tinyint / smallint / int / bigint will all compare as identical for the same numeric integer value. So you're obviously safe on that count, not that it answers your question exactly.

Ian Varley