views:

143

answers:

3

What should be the data type in SQL Server to store multibyte characters ? NVARCHAR or VARCHAR

+2  A: 

It should either be NCHAR or NVARCHAR... [Edit] ... after you convert these to UCS-2.

In both cases this stores Unicode characters (using UCS-2 characterset). Use the NCHAR for fields with a fixed length (fixed number of characters) and NVARCHAR for variable length fields.

The advantage of using UCS-2 is that all supporting logic is easier, since the characters all require exactly two bytes for storage. The disadvantage is that a lot of space gets wasted if most of the text uses mostly ASCII characters.

Thank you, Remus Rusanu, for pointing out that the Multi-byte strings are NOT directly compatible with Unicode UCS sets. If indeed you have say UTF-8 or UTF-16, it will not be stored or handle properly by SQL server before it is converted to UCS-2.

If one is only trying to store this data, without searching by it etc. VARBINARY format is indeed a good thing. Otherwise, one needs to convert as I suggest. (UCS-2 should handle most,but alas, not all, character sets out there).

mjv
MultiByte strings are not Unicode, nor Ascii. Storing MB string as either of these types would give very funky results.
Remus Rusanu
@Remus Rusanu. You are right. I didn't understand user wanted to store "as-is". I edited accordingly.
mjv
I don't know if the user does want to store ‘as is’ — terms like ‘multibyte’ and ‘Unicode’ characters are often thrown about without a precise understanding of what they really mean. On SQL Server, I would certainly try to avoid storing bytes and go with NVARCHAR if at all possible.
bobince
+2  A: 

If you want to store the original MB string, it get dicey. Your best choice is actually varbinary type, since neither varchar nor nvarchar are correct types for MB strings.

In your place I would convert the MB string to Unicode using MultiByteToWideChar and store it as NVARCHAR, and when needed in thr application again I would convert it back from Unicode to MB string using WideCharToMultiByte. This way the data in the database would be coherent for any other use that expects Unicode and/or Ascii strings, there aren't that many users prepared to deal with Multi-Bytes strings stored as byte arrays...

Remus Rusanu
A: 

Its possible to store UTF-8 in varchar fields if you really need to but there are a whole host of caveats.

  • No off-the-shelf data tools would be able to properly display the data without explicit conversions.

  • All character set translation MUST be disabled in the data drivers.

  • System must not accept truncated data unless it understands boundaries.

  • Sort operations may not be correct for all alphabets.

It works if you have no other choice and need to support an encoding like UTF8 without re-engineering your application and schema. Works quite well for web applications where input and output are browser based. Having said all this I'd still recommend going NVARCHAR and convert to UCS2.

Einstein