views:

44

answers:

4

For you database design/performance gurus out there.

I'm designing a table, I have the choice of either use int or nvarchar (128) for a column, assume space is not a problem. My question is which will give performance

when I search with int column

where ID = 12324

or when I search with nvarchar column (the Key is the entire value, so I'm not using LIKE operator)

where Key = 'my str'

I'm sure for smaller datasets it doesn't matter, but let's assume this data will be in the millions of rows.

+2  A: 

The main issue with performance with this is the size of the field - an int is 4 bytes, whereas an nvarchar(128) will be 254 bytes.

All of this needs to be manages by SQL server, so managing an int will be much faster than an nvarchar(128).

Oded
+1 good point. Also, there's a little less work to get a CPU to process a int rather than a string. With millions of rows this could result in a non-neglibable difference. I'd be interested to see some real performance testing on this.
P.Brian.Mackey
+4  A: 

Space is always a problem in databases. Wider keys mean less entries per page, more pages scanned to aggregate and sum values, means more IO, less performance. For clustered indexes, this problem gets multiplied by each non-clustered index, as they have to reproduce the lookup key (clustered key) in their leafs. So a key of type nvarchar(128) will almost always be worse than an INT.

On the other hand, don't use an INT key if is not appropriate. Always use the appropriate key, considering your queries. If you always going to query by an nvarchar(128) column value, then is possibly a good clustered key candidate. If you're going to aggregate by the nvarchar(128) key, then is likely a good clustered key candidate.

Remus Rusanu
+1: Natural > artificial keys, but there are very few natural keys IME.
OMG Ponies
+2  A: 

INT will be faster - here's why:

  • SQL Server organizes its data and index into pages of 8K
  • if you have an index page with INT key on it, you get roughly 2'000 INT entries
  • if you have NVARCHAR(128) and you use on average 20 characters, that's 40 bytes per entry, or roughly 200 entries per page

So for the same amount of index entries, the NVARCHAR(128) case would use ten times as many index pages.

Loading and searching those index pages will incur significantly more I/O operations.

So to make things short: if you can, always use INT .

marc_s
A: 

I would use the int for performance (if this is going to have joins especially) and put a unique index on the potential natural key for data integrity.

HLGEM