Is it just that nvarchar supports multibyte characters? If that is the case is there really any point, other than storage concerns, to using varchars?
nvarchar stores unicode data while varchar stores ascii data. They function identically but nvarchar takes up twice as much space.
nVarchar will help you to store Unicode characters. It is the way to go if you want to store localized data.
nvarchar stores data as unicode, so, if you're going to store multilingual data (more than one language) in a data column you need the N variant.
varchar: Variable-length, non-Unicode character data. The database collation determines which code page the data is stored using.
nvarchar: Variable-length Unicode character data. Dependent on the database collation for comparisons.
Armed with this knowledge, use whichever one matches your input data (ASCII v. Unicode).
I use always nvarchar as it allows whatever I'm building to withstand pretty much any data I throw at it. My CMS system does Chinese by accident, because I used nvarchar. These days, any new apps shouldn't really be concerned with the amount of space required.
You're right. nvarchar
stores Unicode data while varchar
stores single-byte character data. Other than storage differences (nvarchar
requires twice the storage space as varchar
), which you already mentioned, the main reason for preferring nvarchar
over varchar
would be internationalization (i.e. storing strings in other languages).
An nvarchar column can store any Unicode data. A varchar column is restricted to an 8-bit codepage. Some people think that varchar should be used because it takes up less space. I believe this is not the correct answer. Codepage incompatabilities are a pain, and Unicode is the cure for codepage problems. With cheap disk and memory nowadays, there is really no reason to waste time mucking around with code pages anymore.
All modern operating systems and development platforms use Unicode internally. By using nvarchar rather than varchar, you can avoid doing encoding conversions every time you read from or write to the database. Conversions take time, and are prone to errors. And recovery from conversion errors is a non-trivial problem.
If you are interfacing with an application that uses only ASCII, I would still recommend using Unicode in the database. The OS and database collation algorithms will work better with Unicode. Unicode avoids conversion problems when interfacing with other systems. And you will be preparing for the future. And you can always validate that your data is restricted to 7-bit ASCII for whatever legacy system you're having to maintain, even while enjoying some of the benifits of full Unicode storage.
@tags2k
These days, any new apps shouldn't really be concerned with the amount of space required.
If it was just a storage issue then you're probably right - especially for small apps - but here is a list of reasons you may want to choose varchar over nvarchar.
- Your app is interfacing with an older app that uses ascii data. If you store your data as ascii too there is one less thing to go wrong when you communicate with the older app.
- You are storing vast numbers of records - half the size means you can backup your data twice as quickly and store twice as many backups.
- If you are ever going to perform searches on your data then half the size means your searches will run twice as fast.
- You know you will only need ascii data. You want your app to warn you if you're trying to store something else because it probably means something much worse is going on somewhere else!
I would say, it depends.
If you develop a desktop application, where OS works in unicode (like all current windows systems) and language does natively support unicode (default strings are unicode, like in Java or C#), then go nvarchar.
If you develop a web application, where strings come in as UTF8, and language is PHP, which still does not support unicode natively (in versions 5.x), then varchar will probably be a better choice.
It depends on how Oracle was installed. During the installation process, the NLS_CHARACTERSET option is set. You may be able to find it with the query SELECT value$ FROM sys.props$ WHERE name = 'NLS_CHARACTERSET'
.
If your NLS_CHARACTERSET is a Unicode encoding like UTF8, great. Using VARCHAR and NVARCHAR are pretty much identical. Stop reading now, just go for it. Otherwise, or if you have no control over the Oracle character set, read on.
VARCHAR — Data is stored in the NLS_CHARACTERSET encoding. If there are other database instances on the same server, you may be restricted by them; and vice versa, since you have to share the setting. Such a field can store any data that can be encoded using that character set, and nothing else. So for example if the character set is MS-1252, you can only store characters like English letters, a handful of accented letters, and a few others (like € and —). Your application would be useful only to a few locales, unable to operate anywhere else in the world. For this reason, it is considered A Bad Idea.
NVARCHAR — Data is stored in a Unicode encoding. Every language is supported. A Good Idea.
What about storage space? VARCHAR is generally efficient, since the character set / encoding was custom-designed for a specific locale. NVARCHAR fields store either in UTF-8 or UTF-16 encoding, base on the NLS setting ironically enough. UTF-8 is very efficient for "Western" languages, while still supporting Asian languages. UTF-16 is very efficient for Asian languages, while still supporting "Western" languages. If concerned about storage space, pick an NLS setting to cause Oracle to use UTF-8 or UTF-16 as appropriate.
What about processing speed? Most new coding platforms use Unicode natively (Java, .NET, even C++ std::wstring from years ago!) so if the database field is VARCHAR it forces Oracle to convert between character sets on every read or write, not so good. Using NVARCHAR avoids the conversion.
Bottom line: Use NVARCHAR! It avoids limitations and dependencies, is fine for storage space, and usually best for performance too.