ansaurus

Question

Answer 1

+1 A:

Even if the database is set to UTF8 you must do the following things to get Unicode fields to work correctly:

Ensure you are using a Unicode field type like NVARCHAR or TEXT CHARSET utf8
Whenever you insert anything into the field you must prefix it with the N character to indicate Unicode data as shown in the examples below
Whenever you select based on Unicode data ensure you use the N prefix again

MySqlCommand cmd = new MySqlCommand("INSERT INTO EXAMPLE (someField) VALUES (N'Unicode Data')");

MySqlCommand cmd2 = new MySqlCommand("SELECT * FROM EXAMPLE WHERE someField=N'Unicode Data'");

If the database wasn't configured correctly or the data was inserted without using the N prefix it won't be possible to get the correct data out since it will have been downcast into the Latin 1/ASCII character set

RobV 2009-06-03 07:57:29

I'm afraid that that downcast issue is the case.The tables are in UTF8, but the fields are just VARCHAR. Is there no way to "upcast" it to utf8 or whatever coding to get the data back?

Peter 2009-06-03 12:07:06

Strangely, the Ruby on Rails app that uses the database doesn't seem to have any problems with getting the data correctly

Peter 2009-06-03 12:21:23

Answer 2

A:

Try set the encoding by "set names utf8" query. You can set this parameter in mysql config too.

erenon 2009-06-03 08:43:20

Answer 3

A:

As others have said this could be a db issue, but it could also be caused by using an old version of the .net mysql connector.

What I actually wanted to comment on was the utf8 to utf16 conversion. The string you are trying to convert is actually alreay unicode encoded, so your "Ã«" characters actually takes up 4 bytes (or more) and are no longer, at the point of your conversion, a misrepresentation of the "ë" character. That is the reason why your conversion doesn't do anything. If you want to do a conversion like that I think you would have to encode your utf8 string as a old style 1 byte per character string, using a codepage where the byte values of Ã and « actually represent the utf8 byte sequence of ë and then treat the bytes of this new string as an utf8 string. Fun stuff.

JJJ 2009-06-05 20:37:32

ansaurus

tags:

views:

answers:

MySQL C# Text Encoding Problems

related questions