views:

753

answers:

4

How can you get MSSQL server to accept Unicode data by default into a VARCHAR or NVARCHAR column?

I know that you can do it by placing a N in front of the string to be placed in the field but to by quite honest this seems a bit archaic in 2008 and particuarily with using SQL Server 2005.

+1  A: 

If this is a web application, you could probably get your webserver to use UTF8 as it's default encoding. That way all data back and forth to the browser would be UTF8 which can be inserted into VARCHAR fields. UTF8 is a nice way to make applications that are not aware of Unicode deal with it.

bobwienholt
+2  A: 

The N syntax is how you specify a unicode string literal in SQL Server.

N'Unicode string'
'ANSI string'

SQL Server will auto convert between the two when possible, using either a column's collation or the database's collation.

So if your string literals don't actually contain unicode characters, you do not need to specify the N prefix.

But if your string literals do contain unicode characters then using the N prefix is necessary.

Brannon
+1  A: 

While you can simply store UTF8 content in a VARCHAR field in MSSQL Server as long as charset translation is not done you should be aware that:

  1. No management/reporting/data tools outside of your application will be able to understand your non-english characters.

  2. Language specific handling such as sorting a list of names may not be done in the order acceptable for every language.

  3. Must be careful about data truncation. Truncating a multi-byte UTF8 character ususally causes data corruption for the character involved. You should always reject input if it exceeds the field length.

  4. It may not be as easy as you think to disable charset translation..Even if you turn it off in your client driver it can still be overriden in some cases if there is a significant locale difference between client and RDBMS codepage used which instantly leads to data corruption.

  5. If you think this is all you will have to worry about your fooling yourself.

In summary while you might be tempted to go down this path its not a good idea. Code change is required when going multi-byte.

+1  A: 

They really need a way to turn off the need for the N'' prefix. The "it's needed for backwards compatibility" argument makes zero sense to me - sure, make that behavior the default for old apps, but provide an option for me to turn on Unicode strings by default (i.e, no N'' prefix required.) I'm discovering that I need to go and mess with large areas of my app to adapt to Unicode on SQL Server when this is NOT an issue in Oracle and Postgresql. C'mon, Microsoft!

Preston