What SQL Server DataType should I use for a Text field which will be JOINED, "LIKE'd" and Queried against alot?

views:

answers:

+4 Q:

What SQL Server DataType should I use for a Text field which will be JOINED, "LIKE'd" and Queried against alot?

The title really says it all, a bit more info though for those who bothered to click.

Strings will be of variable length typically between 2-5 characters in length, could occasionalyl exceed 5 characters and be upwards of 10. Never more than 10.
Will be queried on like such:
- SELECT ... WHERE ... = 'abcd';
- SELECT ... WHERE ... LIKE 'ab%';
Will be used in table JOINS.

I'm wondering if I should use nvarchar(10) or char(10) or varchar(10)? I don't really know what the advantages or disadvantages of the above are for a case like mine. I appreciate any input you may have.

+7 A:

CHAR(10) will be padded to the defined length with spaces, e.g. if you have CHAR(10) and store "Stack" in it, the contents will really be "Stack.....". Works great for things like state abbreviation (always 2 chars). But the padding does make the querying a bit more cumbersome at time.

VARCHAR(10) will store as many chars as needed - more efficient, but more so for larger strings.

NVARCHAR(10) will be the same - variable length - but with 2 bytes for each character (and NCHAR(10) is the same as CHAR(10) - only 2 bytes per character). Great for when you need to regularly support e.g. Asian, Cyrillic, or Arabic characters or other "non-Latin" alphabets. If you have Western European languages only (English, French, German, Spanish etc.), it's a waste of space.

My gut feeling tells me that CHAR/NCHAR might be a tad quicker when joining, but I don't think it'll really be a significant difference. Otherwise, if you have to reserve 10 characters for each entry, and most of them are only 2 to 5 chars, you are wasting some space, so that's a downside for the CHAR/NCHAR types.

So if I had to decide, I'd probably use VARCHAR(10) (without knowing all your detailed requirements, that is).

marc_s 2010-09-13 19:41:37

+1: But really, don't join on text based info *if you don't absolutely have to*.

OMG Ponies 2010-09-13 19:44:19

@OMG Ponies: yes, totally agree. Works for 2-letter state or language codes, or 3-letter ISO country (or currency) abbreviations - but that's about where it should end, if ever possible

marc_s 2010-09-13 19:45:18

Thanks, I agree with you both. This "code" is an internal ID field, which will be the only common attribute between a few different systems, so mostly it'll be queried, but there will be a fair amount of JOINs between these systems.

Nate Bross 2010-09-13 20:48:50

IF you are designing the field, why not make it an int and use that in all the other systems? That willjoin much faster. If it has to match what the other systems already have, then use the datatype they used.

HLGEM 2010-09-14 13:23:21

Sadly it has to interface with an older system, and said older system is in a propriary database, for which the data type chosen roughly equates to `nvarchar(max)` for the very limmited dataset I'll be dealing with, I'm sure SQL server could have handled it, but I figured `varchar(10)` would be best, but I thought since `char(10)` would be padded the index might be a bit faster, but wanted to get others recommendations, thus I'll be going with @marc_s' suggestion to use `varchar(10)` since it will likely fit the bill best. Had I designed the other systems I would have used `int` :)

Nate Bross 2010-09-20 17:13:35

nvarchar is used if you want to store unicode characters. char(10) would be a waste of storage space.

Use nvarchar if you are supporting languages other than english otherwise varchar should do the trick. comparisons

Don't forget to have a index on this field as this will speed up your joins and comparisons.

Faisal Feroz 2010-09-13 19:42:59

varchar vs. nvarchar: Are you certain to be limited to the US-english printable character set? Or might you need foreign characters?

'char' vs 'varchar': If 10 is a solid, hard limit, then you might be better off with char. The main problem is that the field will always be padded to 10 character, so you'll have to always be Trim()ing them. On some database servers, you have to account for the extra spacing in where clauses (i.e., WHERE col1 = 'abcd ') but I thing MS Sql Server does the sensible thing.

James Curran 2010-09-13 19:44:19

ansaurus

tags:

views:

answers:

What SQL Server DataType should I use for a Text field which will be JOINED, "LIKE'd" and Queried against alot?

related questions