ansaurus

Question

Insert UTF8 data into a MS SQL-Server 2008

Answer 1

+3 A:

I think you have a misunderstanding of what encodings are. An encoding is used to convert a bunch of bytes into a character string. A String does not itself have an encoding associated with it.

Internally, Strings are stored in memory as UTF-16LE bytes (which is why Windows persists in confusing everyone by calling the UTF-16LE encoding just “Unicode”). But you don't need to know that — to you, they're just strings of characters.

What your function does is:

Takes a string and converts it to UTF-8 bytes.
Takes those UTF-8 bytes and converts them to UTF-16LE bytes. (You could have just encoded straight to UTF-16LE instead of UTF-8 in step one.)
Takes those UTF-16LE bytes and converts them back to a string. This gives you the exact same String you had in the first place!

So this function is redundant; you can actually just pass a normal String to SQL Server from .NET and not worry about it.

The bit with the backslashes does do something, presumably application-specific I don't understand what it's for. But nothing in that function will cause Windows to flatten characters like ń to n.

What /will/ cause that kind of flattening is when you try to put characters that aren't in the database's own encoding in the database. Presumably é is OK because that character is in your default encoding of cp1252 Western European, but ń is not so it gets mangled.

SQL Server does use ‘UCS2’ (really UTF-16LE again) to store Unicode strings, but you have tell it to, typically by using a NATIONAL CHARACTER (NCHAR/NVARCHAR) column type instead of plain CHAR.

bobince 2009-09-04 14:26:08

Yap, this encoding/Unicode/UTF stuff still gives me headaches. Anyways, you hit the nail on the head. After changing my column from varchar to nvarchar, the character is stored correctly. Many thanks!

Aaginor 2009-09-04 15:31:01

Answer 2

+1 A:

Hey we were also very confused about encoding.. here's a useful page that explains it:

http://www.joelonsoftware.com/articles/Unicode.html

Also the answer to this question will help to explain it too:

http://stackoverflow.com/questions/1426733/in-c-string-character-encoding-what-is-the-difference-between-getbytes-getstr

CraftyFella 2009-09-15 23:29:11

Yap, I already red the article of Joel and agree with you that it's a pretty good one.

Aaginor 2009-09-17 09:06:52

Answer 3

A:

Convert data from SQL Server to a encoded file in UTF-8 : http://www.xoowiki.com/Article/Batch/sql-server-utf-8-477.aspx

Sacha 2009-12-02 13:35:46

ansaurus

tags:

views:

answers:

Insert UTF8 data into a MS SQL-Server 2008

related questions