views:

12642

answers:

10

I'm working on a database for a small web app at my school using SQL Server 2005. I see a couple of schools of thought on the issue of varchar vs nvarchar:

  1. Use varchar unless you deal with a lot of internationalized data, then use nvarchar.
  2. Just use nvarchar for everything.

I'm beginning to see the merits of view 2. I know that nvarchar does take up twice as much space, but that isn't necessarily a huge deal since this is only going to store data for a few hundred students. To me it seems like it would be easiest not to worry about it and just allow everything to use nvarchar. Or is there something I'm missing?

+39  A: 

Always use nvarchar.

You may never need the double-byte characters for most applications. However, if you need to support double-byte languages and you only have single-byte support in your database schema it's really expensive to go back and modify throughout your application.

The cost of migrating one application from varchar to nvarchar will be much more than the little bit of extra disk space you'll use in most applications.

Joe Barone
it is far harder to go back and add support for multilingual text/messages, time zones, units of measure and currency, so everyone MUST always code these in their application from day one, ALWAYS (even if it is only on your home page web app)!
KM
@KM, touche'... (lol)
dferraro
What about index size, memory usage etc? I assume you always use int when you could use tinyint too "just in case"?
gbn
@KM the point of yours doesn't make sense to the question and answer but still something very important to consider.
Eduardo Xavier
+6  A: 

Since your application is small, there is essentially no appreciable cost increase to using nvarchar over varchar, and you save yourself potential headaches down the road if you have a need to store unicode data.

tbreffni
Famous last words. it's simple to design it correctly now.
gbn
+26  A: 

Disk space is not the issue... but memory and performance will be. Double the page reads, double index size, strange LIKE and = constant behaviour etc

Do you need to store Chinese etc script? Yes or no...

A good article with Tony Rogerson, Joe Celko and Kalen Delaney...

And from MS BOL "Storage and Performance Effects of Unicode"

gbn
+1, if your app goes international, you'll have many other issues to worry about that a search/replace to nvarchar: multilingual text/messages, time zones, units of measure and currency
KM
+20  A: 

Be consistent! JOIN-ing a VARCHAR to NVARCHAR has a big performance hit.

Thomas Harlan
+12  A: 

nvarchar is going to have significant overhead in memory, storage, working set and indexing, so if the specs dictate that it really will never be necessary, don't bother.

I would not have a hard and fast "always nvarchar" rule because it can be a complete waste in many situations - particularly ETL from ASCII/EBCDIC or identifiers and code columns which are often keys and foreign keys.

On the other hand, there are plenty of cases of columns, where I would be sure to ask this question early and if I didn't get a hard and fast answer immediately, I would make the column nvarchar.

Cade Roux
I like this answer best for this question.
JYelton
+1  A: 

I deal with this question at work often:

  • FTP feeds of inventory and pricing - Item descriptions and other text were in nvarchar when varchar worked fine. Converting these to varchar reduced file size almost in half and really helped with uploads.

  • The above scenario worked fine until someone put a special character in the item description (maybe trademark, can't remember)

I still do not use nvarchar every time over varchar. If there is any doubt or potential for special characters, I use nvarchar. I find I use varchar mostly when I am in 100% control of what is populating the field.

K Richard
+6  A: 

For your application, nvarchar is fine because the database size is small. Saying "always use nvarchar" is a vast oversimplification. If you're not required to store things like Kanji or other crazy characters, use VARCHAR, it'll use a lot less space. My predecessor at my current job designed something using NVARCHAR when it wasn't needed. We recently switched it to VARCHAR and saved 15 GB on just that table (it was highly written to). Furthermore, if you then have an index on that table and you want to include that table or make a composite index, you've just made you index file size larger.

Just be thoughtful in your decision; in SQL development and data definitions there seems to rarely be a "default answer" (other than avoid cursors at all costs, of course).

WebMasterP
+3  A: 

similar question here:

http://stackoverflow.com/questions/312170/is-varchar-like-totally-1990s

EDIT by le dorfier: which interestingly came to exactly the opposite conclusion.

Booji Boy
+3  A: 

For that last few years all of our projects have used NVARCHAR for everything, since all of these projects are multilingual. Imported data from external sources (e.g. an ASCII file, etc.) is up-converted to Unicode before being inserted into the database.

I've yet to encounter any performance-related issues from the larger indexes, etc. The indexes do use more memory, but memory is cheap.

Whether you use stored procedures or construct SQL on the fly ensure that all string constants are prefixed with N (e.g. SET @foo = N'Hello world.';) so the constant is also Unicode. This avoids any string type conversion at runtime.

YMMV.

devstuff
+1  A: 

Why, in all this discussion, has there been no mention of UTF-8? Being able to store the full unicode span of characters does not mean one has to always allocate two-bytes-per-character (or "code point" to use the UNICODE term). All of ASCII is UTF-8. Does SQL Server check for VARCHAR() fields that the text is strict ASCII (i.e. top byte bit zero)? I would hope not.

If then you want to store unicode and want compatibility with older ASCII-only applications, I would think using VARCHAR() and UTF-8 would be the magic bullet: It only uses more space when it needs to.

For those of you unfamiliar with UTF-8, might I recommend a primer.

Tevya
What you are suggesting might work for some applications, but one must also consider the impact of an extra encoding layer on the way SQL text is processed. In particular, collations, searching, and pattern matching will be effected. And if reports are run against the database, standard reporting tools will not interperate the multi-byte characters correctly. And bulk imports and exports may be effected. I think that—over the long term—this scheme may be more trouble than it’s worth.
Jeffrey L Whitledge