views:

1851

answers:

7

We have an application written in C#, using .NET Framework 3.0 or 3.5 or something like that. As storage we use SQL Server, and we use Linq 2 SQL to talk with it.

Currently most (if not all) text columns in the database, are set to the varchar type (of different lengths of course).

But I started thinking... according to MSDN "The string type represents a sequence of zero or more Unicode characters." Does that mean that we should really change those columns to nvarchar for things to be stored correctly? Or how does that work? Their reasoning for setting them to varchar originally was because nvarchar requires twice as much space (if I have understood correctly). And as far as I have seen so far, it works with varchar, but we haven't done very much testing with unusal foreign characters...

Could someone shed some light on this?

+3  A: 

Yes, nvarchar stores characters as unicode, like a .NET string. If you need to store strings containing characters in different languages, you should probably go with nvarchar.

If you have only characters from a single language, you have another option to go with varchar and choose the specific collation for that language (which will save space but makes life much more complicated).

Mehrdad Afshari
In what ways does this "make life much more complicated"?
j0rd4n
Take the space hit, it's not worth it in the long run. Space is cheap.
Chuck Conway
j0rd4n, First of all you must be sure that no other kind of text is going in. Second, some collations are not supported by some versions of SQL server. Third, you should set the specific collation and .... Most of the time, I'll happily take the slight performance and storage cost of nvarchar.
Mehrdad Afshari
A: 

Yes, you really should be using NVARCHAR, or you risk loosing characters due to encoding problems.

I think the operative part of your post here is as far as I have seen so far.

Try some examples using some of the characters found below the center of the characters in charmap, like the arabic characters, see how they are stored.

Lasse V. Karlsen
+4  A: 

Unless you've got text which is guaranteed to be representable in the code page of your database (or you're specifying the collation explicitly) I'd use nvarchar.

There are some cases where you can guarantee that the contents will be ASCII, in which case you could indeed use varchar - but I don't know how significant the benefits of that will be compared with the hassles of making absolutely sure that not only will the contents be ASCII, but you're never going to want anything other than ASCII (or within the specific code page).

Jon Skeet
I agree just use nvarchar. It's safer and if you need it down the road it's will save you the headache.
Chuck Conway
Agree. Disk space is cheap these days.
Petar Repac
+1  A: 

As the others have said, is your application ever going to store 2 byte Unicode (UTF-16/32)? If not then a varchar will be fine for plain ascii (possibly even Window default UTF8, not sure). .NET strings are interned as UTF16 though.

Unless you are holding huge amounts of text in the database and are short on disk space the difference will be minute, so you may as well stick with NVarchar.

Chris S
A: 

NVARCHAR(X) in T-SQL translates to "UTF-16 encoding": each of the X characters is 16 bits wide. If you are storing human-readable text, you're probably better off using NVARCHAR for text storage. VARCHAR(X) implies 8-bit storage for the characters. This implies that some transcoding has to happen between your C# application (which uses UTF-16 internally in its string type) and your database.

Save yourself some grief and use Unicode consistently where you are storing human-readable strings.

John Källén
+4  A: 

There's a lot of perspective from the .net side of things. Here's a thought from the database side of things:

A varchar is half the size of an nvarchar. While this is not significant for many purposes, it is highly significant for indexes. An index that is half as wide, is twice as fast. This is because twice as many values can be stored on a datapage (the unit of database IO).

There are certain strings that you (the app) controls the construction of and would like to use to access important records. Alphanumeric identifiers (such as a customer number) fall into this category. Since you control the construction, you can force these to be safely varchar (and frequently do anyway). Why not gain the benefit of a half-sized double-fast index for this effort you're already doing?

David B
A: 

If you're using SQL2005 then definitely use nvarchar. If you're using SQL2000 then watch out for the 8000 byte total row size limit - you'll run into this sooner with nvarchar as they take up twice as much space.