tags:

views:

11717

answers:

11

Is it just that nvarchar supports multibyte characters? If that is the case is there really any point, other than storage concerns, to using varchars?

+30  A: 

nvarchar stores unicode data while varchar stores ascii data. They function identically but nvarchar takes up twice as much space.

so it's using utf16? why not utf8, which will use only as much space as needed.
knittl
@knittl - char, nchar, varchar, and nvarchar each take a length parameter which the database engine can use to optimize storage. UTF-16 (when sticking to the BMP) has a simple two-to-one relationship between bytes of storage and characters, which the database can take advantage of. The amount of storage required for a UTF-8 string of N characters is not as clear, and could result in wasted space, or unexpectedly truncated strings. (For chars outside the BMP, UTF-16 strings could also be truncated unexpectedly, but this is less common, esp. since many OS's/dev-platforms use UTF-16 internally.)
Jeffrey L Whitledge
By the way, they do not function identically...
sixlettervariables
I am completely dismayed that this answer, which contains at least two errors, is still the most highly ranked answer. varchar does not store ASCII, it stores an 8-bit encoding, selected at random by the person who installed the database in the middle of the night while partially intoxicated. nvarchar and varchar do not "function identically" since varchar does not function at all in lots of scenarios. nvarchar may take up twice as much space, but varchar is twice as slow, since it requires string conversions for every read and write (on those occasions when it actually works).
Jeffrey L Whitledge
Your comment itself contains an error, Jeffrey, but I've overlooked it and given you an upvote anyway because it's otherwise brilliant. Your error is the use of the word 'partially' in the second sentence. :)
Cowan
A: 

nVarchar will help you to store Unicode characters. It is the way to go if you want to store localized data.

Vijesh VP
A: 

Here is an ok discussion of this.

Ólafur Waage
+1  A: 

nvarchar stores data as unicode, so, if you're going to store multilingual data (more than one language) in a data column you need the N variant.

AlbertEin
+31  A: 

varchar: Variable-length, non-Unicode character data. The database collation determines which code page the data is stored using.

nvarchar: Variable-length Unicode character data. Dependent on the database collation for comparisons.

Armed with this knowledge, use whichever one matches your input data (ASCII v. Unicode).

sixlettervariables
+7  A: 

I use always nvarchar as it allows whatever I'm building to withstand pretty much any data I throw at it. My CMS system does Chinese by accident, because I used nvarchar. These days, any new apps shouldn't really be concerned with the amount of space required.

tags2k
The idea that new apps shouldn't be concerned with space restrictions is somewhat short-sighted, and anyone who has dealt with databases at the medium-to-large enterprise level will be happy to tell you, completely incorrect.
Frater
To take the liberty of putting words in tags2k's mouth, I think a more accurate statement might be 'it's increasingly unlikely that any new apps should be more concerned about the space required than they should be about internationalisation and other character set issues'.
Cowan
Thanks Cowan, that is what I meant... over 2 years ago. Holy smokes!
tags2k
+2  A: 

You're right. nvarchar stores Unicode data while varchar stores single-byte character data. Other than storage differences (nvarchar requires twice the storage space as varchar), which you already mentioned, the main reason for preferring nvarchar over varchar would be internationalization (i.e. storing strings in other languages).

Mike Spross
+13  A: 

An nvarchar column can store any Unicode data. A varchar column is restricted to an 8-bit codepage. Some people think that varchar should be used because it takes up less space. I believe this is not the correct answer. Codepage incompatabilities are a pain, and Unicode is the cure for codepage problems. With cheap disk and memory nowadays, there is really no reason to waste time mucking around with code pages anymore.

All modern operating systems and development platforms use Unicode internally. By using nvarchar rather than varchar, you can avoid doing encoding conversions every time you read from or write to the database. Conversions take time, and are prone to errors. And recovery from conversion errors is a non-trivial problem.

If you are interfacing with an application that uses only ASCII, I would still recommend using Unicode in the database. The OS and database collation algorithms will work better with Unicode. Unicode avoids conversion problems when interfacing with other systems. And you will be preparing for the future. And you can always validate that your data is restricted to 7-bit ASCII for whatever legacy system you're having to maintain, even while enjoying some of the benifits of full Unicode storage.

Jeffrey L Whitledge
+1  A: 

@tags2k

These days, any new apps shouldn't really be concerned with the amount of space required.

If it was just a storage issue then you're probably right - especially for small apps - but here is a list of reasons you may want to choose varchar over nvarchar.

  • Your app is interfacing with an older app that uses ascii data. If you store your data as ascii too there is one less thing to go wrong when you communicate with the older app.
  • You are storing vast numbers of records - half the size means you can backup your data twice as quickly and store twice as many backups.
  • If you are ever going to perform searches on your data then half the size means your searches will run twice as fast.
  • You know you will only need ascii data. You want your app to warn you if you're trying to store something else because it probably means something much worse is going on somewhere else!
VARCHAR on SQL Server != ASCII, rendering your first and fourth points moot. There's no VARCHAR code page which will only hold ASCII text.
Cowan
And some of the collations it DOES support are two-byte encodings, so points 2 and 3 are only correct for some character sets.
Cowan
A: 

I would say, it depends.

If you develop a desktop application, where OS works in unicode (like all current windows systems) and language does natively support unicode (default strings are unicode, like in Java or C#), then go nvarchar.

If you develop a web application, where strings come in as UTF8, and language is PHP, which still does not support unicode natively (in versions 5.x), then varchar will probably be a better choice.

sleepy012
+1  A: 

It depends on how Oracle was installed. During the installation process, the NLS_CHARACTERSET option is set. You may be able to find it with the query SELECT value$ FROM sys.props$ WHERE name = 'NLS_CHARACTERSET'.

If your NLS_CHARACTERSET is a Unicode encoding like UTF8, great. Using VARCHAR and NVARCHAR are pretty much identical. Stop reading now, just go for it. Otherwise, or if you have no control over the Oracle character set, read on.

VARCHAR — Data is stored in the NLS_CHARACTERSET encoding. If there are other database instances on the same server, you may be restricted by them; and vice versa, since you have to share the setting. Such a field can store any data that can be encoded using that character set, and nothing else. So for example if the character set is MS-1252, you can only store characters like English letters, a handful of accented letters, and a few others (like € and —). Your application would be useful only to a few locales, unable to operate anywhere else in the world. For this reason, it is considered A Bad Idea.

NVARCHAR — Data is stored in a Unicode encoding. Every language is supported. A Good Idea.

What about storage space? VARCHAR is generally efficient, since the character set / encoding was custom-designed for a specific locale. NVARCHAR fields store either in UTF-8 or UTF-16 encoding, base on the NLS setting ironically enough. UTF-8 is very efficient for "Western" languages, while still supporting Asian languages. UTF-16 is very efficient for Asian languages, while still supporting "Western" languages. If concerned about storage space, pick an NLS setting to cause Oracle to use UTF-8 or UTF-16 as appropriate.

What about processing speed? Most new coding platforms use Unicode natively (Java, .NET, even C++ std::wstring from years ago!) so if the database field is VARCHAR it forces Oracle to convert between character sets on every read or write, not so good. Using NVARCHAR avoids the conversion.

Bottom line: Use NVARCHAR! It avoids limitations and dependencies, is fine for storage space, and usually best for performance too.

Jeremy Frank
This is a really good answer, except that the question is about sql-server.
stimms