tags:

views:

384

answers:

2

Hi,

I am using SQL server 2008 express and some of our columns are defined as varchar(255). Should I convert these columns to NvarChar(255) or nvarchar(max)?

The reason I ask is I read that nvarchar(255) for unicode characters would actually store 1/2 the number of characters (since unicode characters are 2 bytes) whereas 255 with varchar() would allow me to store 255 characters (or is it 255 - 2 for the offset).

Would there be any performance hits using nvarchar(max)?

JDs

+4  A: 

Well, not quite - converting to NVarChar(255) doesn't cut your number of characters being stored in half - it still stores 255 characters. It just needs twice as much space (510 bytes vs. 255 bytes).

You should convert to NVARCHAR - even though it uses twice as much space all the time - if you:

  • need to support Arabic, Hebrew, Cyrillic, or any of the East Asian languages - only in Unicode will you be able to actually capture those characters
  • need to support other languages which use the "standard" Latin alphabet, but have special characters - things like Eastern European (Slavic) languages with their characters like č ă ě - those will be stored as just c, a, e in a varchar() field

NVarchar(max) is a great option - if you really need up to 2 GB of text. Making all string fields nvarchar(max) just do be "consistent" is a really really bad idea - you'll have massive performance issues. See Remus Rusanu's article on the topic

marc_s
Thanks for all the information.
JD
+2  A: 

You should have some kind of justification for every data type you use.

nvarchar(255) (in SQL Server) stores 255 Unicode characters (in 510 bytes plus overhead).

It's certainly possible to store ordinary UTF-8 encoded Unicode data in varchar columns - one varchar character per byte in the source (UTF-8 will use multiple bytes appropriately for wide characters). In this case, ordinary ASCII data uses only 1 byte per character so you don't have the double-byte overheads. It has a lot of drawbacks, not least of which being that the database no longer can help as much with collations and other character manipulation work since the data is potentially encoded. But, like I said, it's possible.

I recommend char or varchar characters of appropriate lengths for things like account numbers where a decimal might not be used because zero-padding matters, license numbers, invoice numbers (with letters), postal codes, phone numbers, etc. These are types of columns that NEVER contain any wide characters, and are usually restricted to roman letters and numbers only, sometimes not even punctuation, and are often heavily indexed. There is absolutely no need for the overhead of extra NUL high-bytes for all these characters in the columns in both tables and indexes and in the working set in the database engine.

I recommend nvarchar for things like names and addresses etc, where wide characters are possible, perhaps even when there is no foreseeable usage in the near term.

I typically never use nchar - I have never needed short codes (typically where I chose char columns) which needed wide characters.

In all cases, the length (or max) usage really should be fully thought about. I would definitely not use max for names or addresses, and the overhead can be obvious in benchmarking. I have seen casting to varchar(length) in intermediate stages of queries drastically improve performance.

Cade Roux
Thank you Cade for your input.
JD