views:

304

answers:

5

I have a database that uses codes. Each code can be anywhere from two characters to ten characters long.

In MS SQL Server, is it better for performance to use char(10) for these codes and RTRIM them as they come in, or should I use varchar(10) and not have to worry about trimming the extra whitespace? I need to get rid of the whitespace because the codes will then be used in application logic for comparisons and what not.

As for the average code length, hard to tell exactly. Assume all codes are a random length between one and ten. Edit: A rough estimation is about 4.7 characters for the average length of a code.

+4  A: 

I'd vote for varchar.

I say varchar to avoid the TRIM which would invalidate index usage (unless you use a computed column etc which defeats the purpose, no?).

Otherwise at length 10, it would be 50/50 but TRIM tips the balance towards varchar and wins out over the fixed length benefit

gbn
why would there be any need to use TRIM?
KM
Cuz the guy is talking about using char instead of varchar
gmagana
@KM: a char column compare with a varchar literal will convert the column to varchar (datatype precedence). Then you'll need to TRIM it. Or CAST the literal to char. Whatever, it's simpler just to use varchar to keep code simpler
gbn
A: 

In one old book I read that in general char is a better choice when for the most of the records the real string length is at least 60% of maximum; in your example - if more than half of all records have length 6 or greater. Otherwise, use varchar.

a1ex07
I would suspect blanket statements like that simply because they make assumptions about the platform. that stuff could have been optimized or the inderlying reasons changed within a minor version update. For example, it used to be faster in Oracle to do a COUNT(1) than COUNT([star]). How long would it take Oracle to make COUNT([star]) as fast as COUNT(1)? Two lines of code? So stuff like that should be suspect in the long-term.
gmagana
Sorry, in the comment above I don't know how to escape the asterisk character
gmagana
+1  A: 

Your requirements are a textbook definition of someone who needs to use varchar.

If you want to worry about performance, worry about DB design and writing good SQL. Char vs VarChar internals are well-optimized by the DB vendors.

gmagana
+1  A: 

I'm confident that you wouldn't be able to tell a speed difference between the two.

codingguy3000
Oh yes you will. 10 bytes vs. 6.7 avg length (4.7 + lenght) over 1M rows gives 3.3Mb less memory space consumed, 3.3Mb less data to read and write, 3.3Mb less log to write, backups are smaller with 3.3Mb etc etc.
Remus Rusanu
I prefer the formual (TimeToComparePerformanceCosts * HourlyRateOfPerson) >= (PerformanceSavings% * HourlyRateOfHardwareCosts)
Nat
+2  A: 

As a general rule, always favor smaller storage over extra CPU. Because the driving factor of database performance is always IO and smaller data records means more records per page and this in turn means fewer IO requests. The extra CPU involved in handling the variable length is not going to be a factor. Historically, in the dark ages of '80s and even in the '90s it may have been a measurable factor, but today is just noise. Because the CPU and memory access have increased tremendously, but the IO speed has stayed pretty much constant. That's why 'old books' advice does not apply today. Unless you have a constant field like char(2) or similar, just use varchar, always.

Remus Rusanu