views:

278

answers:

4

I want to store English, French, German, Italian, and Spanish in a Sql Server 2005 database to be used with a .NET application. Can I get away with not using Unicode? Will there be any issues with these languages?

+5  A: 

In the SQL server 2008 R2 there will be Unicode compression, see Unicode Compression in SQL Server 2008R2. This will make the problem of storage space of nvarchar vs. varchar largely a problem of the past. You are on SQL 2005 still, but you should program in the future tense.

The question of varchar vs. nvarchar is only one facet of the problem. The other facet is the enforcement of proper collation (needed for nvarchar just as much as for varchar). Since columns cannot have multiple collations, the common solution is to separate the data into string tables for each language, where columns are declared with the appropiate collation for each language used.

Update

There is a lengthy discussion of SQL Server 2005 international data at International Features in Microsoft SQL Server 2005. BTW, comments like 'just use UTF-8' are just missing the point. SQL Server stores nvarchar data encoded as UCS-2 and that's it, period. You can store XML data as UTF-8 or UTF-16, but no sane database person would recommend using XML to store your strings.

Also while you may get away with an encoding like 1252, you will not get away so easily with a single collation. Specially since you have Spanish as a requirement and Spanish collations are notoriously problematic. For example your Spanish speaking users will expect 'Chiapas' to sort after 'Colima', but the Latin collation will sort 'Colima' after 'Chiapas', see Working with Collations. Other problems will appear at comparison, where names that are different may be compared to be equal, again due to the wrong collation choice.

Remus Rusanu
Just adding some hard data, I did several storage and performance comparisons of the enhancements to compression in 2008 R2 : http://is.gd/4yleO
Aaron Bertrand
+4  A: 

You can get away with not using Unicode, as long as your entire app assumes a fixed text encoding of windows-1252 (or ISO-8859-1). These are both pure single-byte character sets that cover all Western European alphabets.

However, you should seriously consider Unicode anyway, because sooner or later you will be asked to expand text storage beyond the limits of windows-1252. Not doing so would be like writing new code to store 2-digit years in the last decade of the 20th century.

Christian Hayter
+1  A: 

iso-8859-15 should be enough for all your western-europe language needs.

But, I'd rather stick to UTF-8.

Tordek
OK, show me how to use UTF-8 in SQL Server
gbn
A: 

I usually recommend that unicode be used unless you know for sure that you'll never need it. And as that limits the languages you can support in the database and everyone wants to do as much business as possible unicode is usually better to start with then try to change to later.

It does double your storage for those fields but that usually isn't all that much to worry about.

mrdenny